SlideShare a Scribd company logo
1 of 12
TM

Parallel Concepts
Dr. C.V. Suresh Babu
TM

The Goal of Parallelization
• Reduction of elapsed time of a program
• Reduction in turnaround time of jobs
cpu
time

1 processor

communication
overhead

4 processors

• Overhead:
–
–
–
–
–

total increase in cpu time
communication
synchronization
additional work in algorithm
non-parallel part of the program
• (one processor works, others spin idle)

pr
oc
s

finish

2

Elapsed time

8 proc
s
4 p
roc
s

start

1

or
s
es
c
ro
p

Reduction in
elapsed time

Elapsed time
TM

Speedup and Efficiency
Both measure the parallelization properties of a program
• Let T(p) be the elapsed time on p processors
• The Speedup S(p) and the Efficiency E(p) are defined as:
S(p) = T(1)/T(p)
E(p) = S(p)/p
• for ideal parallel speedup we get:
Speedup

ideal

T(p) = T(1)/p
S(p) = T(1)/T(p) = p
E(p) = S(p)/p = 1 or 100%
Efficiency
1

Super-linear
Saturation
Disaster
Number of processors

Number of processors
Amdahl’s Law
This rule states the following for parallel programs:
The non-parallel fraction of the code (I.e. overhead)
imposes the upper limit on the scalability of the code
• the non-parallel (serial) fraction s of the program includes the
(1)
1 = s + f
! program has serial
communication and synchronization overhead and parallel fractions
(2)
(3)
(4)

T(1) =
=
=
T(p) =
S(p) =
=
<

(5)

T(parallel) + T(serial)
T(1) *(f + s)
T(1) *(f + (1-f))
T(1) *(f/p + (1-f))
T(1)/T(p)
1/(f/p + 1-f)
1/(1-f)
! for p-> inf.

S(p) < 1/(1-f)

TM
Amdahl’s Law: Time to Solution

T(p) = T(1)/S(p)
S(p) = 1/(f/p + (1-f))

Hypothetical program run time as function of #processors for several
parallel fractions f. Note the log-log plot

TM
TM

Fine-Grained Vs Coarse-Grained
• Fine-grain parallelism (typically loop level)
–
–
–
–

can be done incrementally, one loop at a time
does not require deep knowledge of the code
a lot of loops have to be parallel for decent speedup
potentially many synchronization points
MAIN
(at the end of each parallel loop)
A
E

B

F

C

G
K

• Coarse-grain parallelism
– make larger loops parallel at higher call-tree level
potentially in-closing many small loops
– more code is parallel at once
– fewer synchronization points, reducing overhead
– requires deeper knowledge of the code

H

L
p

Coarse-grained
D
I

J
N

M

O

q
r

s
t

Fine-grained
TM

Other Impediments to Scalability
Load imbalance:

p0
p1
p2
p3

• the time to complete a parallel
execution of a code segment is
start
determined by the longest running thread

Elapsed time

finish

• unequal work load distribution leads to some processors being
idle, while others work too much

with coarse grain parallelization, more opportunities for load
imbalance exist

Too many synchronization points
• compiler will put synchronization points at the start and exit of
each parallel region
Computing π with DPL
π=

1

4 dx
(1+x2)
0

=Σ
0<i<N

4
N(1+((i+0.5)/N)2)

PROGRAM PIPROG
INTEGER, PARAMETER:: N = 1000000
REAL (KIND=8):: LS,PI, W = 1.0/N
PI = SUM( (/ (4.0*W/(1.0+((I+0.5)*W)**2),I=1,N) /) )
PRINT *, PI
END

Notes:
–
–
–
–
–

essentially sequential form
automatic detection of parallelism
automatic work sharing
all variables shared by default
number of processors specified outside of the code

compile with:

TM
Computing π with Shared Memory
π=

1

4 dx
(1+x2)
0

=Σ
0<i<N

4
N(1+((i+0.5)/N)2)

#define n 1000000
main()
{
double pi, l, ls = 0.0, w = 1.0/n;
int i;
#pragma omp parallel for private(i,l) reduction(+:ls)
for(i=0; i<n; i++) {
l = (i+0.5)*w;
ls += 4.0/(1.0+l*l);
}
printf(“pi is %fn”,ls*w);
}

Notes:
– essentially sequential form
– automatic work sharing

TM
Computing π with Message Passing
1
#include <mpi.h>
4 dx
#define N 1000000
(1+x2)
main()
0
{
double pi, l, ls = 0.0, w = 1.0/N;

π=

=Σ
0<i<N

4
N(1+((i+0.5)/N)2)

int i, mid, nth;

MPI_init(&argc, &argv);
MPI_comm_rank(MPI_COMM_WORLD,&mid);
MPI_comm_size(MPI_COMM_WORLD,&nth);

}

for(i=mid; i<N; i += nth) {
l = (i+0.5)*w;
ls += 4.0/(1.0+l*l);
}
MPI_reduce(&ls,&pi,1,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD);
if(mid == 0) printf(“pi is %fn”,pi*w);
MPI_finalize();

Notes:

TM
Comparing Parallel Paradigms
• Automatic parallelization combined with explicit Shared Variable
programming (compiler directives) used on machines with global
memory

– Symmetric Multi-Processors, CC-NUMA, PVP
– These methods collectively known as Shared Memory Programming (SMP)
– SMP programming model works at loop level, and coarse level parallelism:
• the coarse level parallelism has to be specified explicitly
• loop level parallelism can be found by the compiler (implicitly)

– Explicit Message Passing Methods are necessary with machines that
have no global memory addressability:
• clusters of all sort, NOW & COW

– Message Passing Methods require coarse level parallelism to be scalable

•Choosing programming model is largely a matter of the application,
personal preference and the target machine.

•it has nothing to do with scalability.
limitations:
– communication overhead
– process synchronization

Scalability

function of
•scalability is mainly aparallelism the hardware and (your)
implementation of the

TM
Summary

TM

• The serial part or the communication overhead of the code limits the
scalability of the code (Amdahl Law)
• programs have to be >99% parallel to use large (>30 proc) machines
• several Programming Models are in use today:
– Shared Memory programming (SMP) (with Automatic Compiler
parallelization, Data-Parallel and explicit Shared Memory models)
– Message Passing model
• Choosing a Programming Model is largely a matter of the application,
personal choice and target machine. It has nothing to do with scalability.
– Don’t confuse Algorithm and implementation
• machines with a global address space can run applications based on
both, SMP and Message Passing programming models

More Related Content

What's hot

Computer Organozation
Computer OrganozationComputer Organozation
Computer OrganozationAabha Tiwari
 
Peephole optimization techniques in compiler design
Peephole optimization techniques in compiler designPeephole optimization techniques in compiler design
Peephole optimization techniques in compiler designAnul Chaudhary
 
Embedded programming u3 part 1
Embedded programming u3 part 1Embedded programming u3 part 1
Embedded programming u3 part 1Karthik Vivek
 
Performance measures
Performance measuresPerformance measures
Performance measuresDivya Tiwari
 
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...Dr.K. Thirunadana Sikamani
 
ONNC - 0.9.1 release
ONNC - 0.9.1 releaseONNC - 0.9.1 release
ONNC - 0.9.1 releaseLuba Tang
 
Unit 3-pipelining &amp; vector processing
Unit 3-pipelining &amp; vector processingUnit 3-pipelining &amp; vector processing
Unit 3-pipelining &amp; vector processingvishal choudhary
 
Runtimeenvironment
RuntimeenvironmentRuntimeenvironment
RuntimeenvironmentAnusuya123
 
12109 microprocessor & programming
12109 microprocessor & programming12109 microprocessor & programming
12109 microprocessor & programmingGaurang Thakar
 
Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2Iffat Anjum
 
parallel language and compiler
parallel language and compilerparallel language and compiler
parallel language and compilerVignesh Tamil
 
Move Message Passing Interface Applications to the Next Level
Move Message Passing Interface Applications to the Next LevelMove Message Passing Interface Applications to the Next Level
Move Message Passing Interface Applications to the Next LevelIntel® Software
 
Instruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInstruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInteX Research Lab
 
MLPerf an industry standard benchmark suite for machine learning performance
MLPerf an industry standard benchmark suite for machine learning performanceMLPerf an industry standard benchmark suite for machine learning performance
MLPerf an industry standard benchmark suite for machine learning performancejemin lee
 

What's hot (19)

Computer Organozation
Computer OrganozationComputer Organozation
Computer Organozation
 
Lecture7
Lecture7Lecture7
Lecture7
 
Peephole optimization techniques in compiler design
Peephole optimization techniques in compiler designPeephole optimization techniques in compiler design
Peephole optimization techniques in compiler design
 
Lecture6
Lecture6Lecture6
Lecture6
 
Embedded programming u3 part 1
Embedded programming u3 part 1Embedded programming u3 part 1
Embedded programming u3 part 1
 
Performance measures
Performance measuresPerformance measures
Performance measures
 
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
 
ONNC - 0.9.1 release
ONNC - 0.9.1 releaseONNC - 0.9.1 release
ONNC - 0.9.1 release
 
Unit 3-pipelining &amp; vector processing
Unit 3-pipelining &amp; vector processingUnit 3-pipelining &amp; vector processing
Unit 3-pipelining &amp; vector processing
 
Runtimeenvironment
RuntimeenvironmentRuntimeenvironment
Runtimeenvironment
 
TensorRT survey
TensorRT surveyTensorRT survey
TensorRT survey
 
12109 microprocessor & programming
12109 microprocessor & programming12109 microprocessor & programming
12109 microprocessor & programming
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2
 
parallel language and compiler
parallel language and compilerparallel language and compiler
parallel language and compiler
 
Move Message Passing Interface Applications to the Next Level
Move Message Passing Interface Applications to the Next LevelMove Message Passing Interface Applications to the Next Level
Move Message Passing Interface Applications to the Next Level
 
Caap presentation by me
Caap presentation by meCaap presentation by me
Caap presentation by me
 
Instruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInstruction pipeline: Computer Architecture
Instruction pipeline: Computer Architecture
 
MLPerf an industry standard benchmark suite for machine learning performance
MLPerf an industry standard benchmark suite for machine learning performanceMLPerf an industry standard benchmark suite for machine learning performance
MLPerf an industry standard benchmark suite for machine learning performance
 

Similar to Parallel concepts1

Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxkrnaween
 
3. Potential Benefits, Limits and Costs of Parallel Programming.pdf
3. Potential Benefits, Limits and Costs of Parallel Programming.pdf3. Potential Benefits, Limits and Costs of Parallel Programming.pdf
3. Potential Benefits, Limits and Costs of Parallel Programming.pdfMohamedAymen14
 
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxjohnsmith96441
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesMurtadha Alsabbagh
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computingVajira Thambawita
 
Lecture 2 more about parallel computing
Lecture 2   more about parallel computingLecture 2   more about parallel computing
Lecture 2 more about parallel computingVajira Thambawita
 
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisShah Zaib
 
Lec 4 (program and network properties)
Lec 4 (program and network properties)Lec 4 (program and network properties)
Lec 4 (program and network properties)Sudarshan Mondal
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 
01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.ppt01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.pptHarshitPal37
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterSudhang Shankar
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureMani Goswami
 

Similar to Parallel concepts1 (20)

Lecture1
Lecture1Lecture1
Lecture1
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptx
 
3. Potential Benefits, Limits and Costs of Parallel Programming.pdf
3. Potential Benefits, Limits and Costs of Parallel Programming.pdf3. Potential Benefits, Limits and Costs of Parallel Programming.pdf
3. Potential Benefits, Limits and Costs of Parallel Programming.pdf
 
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
 
Chap5 slides
Chap5 slidesChap5 slides
Chap5 slides
 
Lecture5
Lecture5Lecture5
Lecture5
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
Lecture 2 more about parallel computing
Lecture 2   more about parallel computingLecture 2   more about parallel computing
Lecture 2 more about parallel computing
 
Matrix multiplication
Matrix multiplicationMatrix multiplication
Matrix multiplication
 
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
 
Lec 4 (program and network properties)
Lec 4 (program and network properties)Lec 4 (program and network properties)
Lec 4 (program and network properties)
 
CUDA
CUDACUDA
CUDA
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.ppt01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.ppt
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC cluster
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
Esctp snir
Esctp snirEsctp snir
Esctp snir
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
 

More from Dr. C.V. Suresh Babu (20)

Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
 
Association rules
Association rulesAssociation rules
Association rules
 
Clustering
ClusteringClustering
Clustering
 
Classification
ClassificationClassification
Classification
 
Blue property assumptions.
Blue property assumptions.Blue property assumptions.
Blue property assumptions.
 
Introduction to regression
Introduction to regressionIntroduction to regression
Introduction to regression
 
DART
DARTDART
DART
 
Mycin
MycinMycin
Mycin
 
Expert systems
Expert systemsExpert systems
Expert systems
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
 
Bayes network
Bayes networkBayes network
Bayes network
 
Bayes' theorem
Bayes' theoremBayes' theorem
Bayes' theorem
 
Knowledge based agents
Knowledge based agentsKnowledge based agents
Knowledge based agents
 
Rule based system
Rule based systemRule based system
Rule based system
 
Formal Logic in AI
Formal Logic in AIFormal Logic in AI
Formal Logic in AI
 
Production based system
Production based systemProduction based system
Production based system
 
Game playing in AI
Game playing in AIGame playing in AI
Game playing in AI
 
Diagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AIDiagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AI
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 

Recently uploaded

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 

Recently uploaded (20)

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 

Parallel concepts1

  • 2. TM The Goal of Parallelization • Reduction of elapsed time of a program • Reduction in turnaround time of jobs cpu time 1 processor communication overhead 4 processors • Overhead: – – – – – total increase in cpu time communication synchronization additional work in algorithm non-parallel part of the program • (one processor works, others spin idle) pr oc s finish 2 Elapsed time 8 proc s 4 p roc s start 1 or s es c ro p Reduction in elapsed time Elapsed time
  • 3. TM Speedup and Efficiency Both measure the parallelization properties of a program • Let T(p) be the elapsed time on p processors • The Speedup S(p) and the Efficiency E(p) are defined as: S(p) = T(1)/T(p) E(p) = S(p)/p • for ideal parallel speedup we get: Speedup ideal T(p) = T(1)/p S(p) = T(1)/T(p) = p E(p) = S(p)/p = 1 or 100% Efficiency 1 Super-linear Saturation Disaster Number of processors Number of processors
  • 4. Amdahl’s Law This rule states the following for parallel programs: The non-parallel fraction of the code (I.e. overhead) imposes the upper limit on the scalability of the code • the non-parallel (serial) fraction s of the program includes the (1) 1 = s + f ! program has serial communication and synchronization overhead and parallel fractions (2) (3) (4) T(1) = = = T(p) = S(p) = = < (5) T(parallel) + T(serial) T(1) *(f + s) T(1) *(f + (1-f)) T(1) *(f/p + (1-f)) T(1)/T(p) 1/(f/p + 1-f) 1/(1-f) ! for p-> inf. S(p) < 1/(1-f) TM
  • 5. Amdahl’s Law: Time to Solution T(p) = T(1)/S(p) S(p) = 1/(f/p + (1-f)) Hypothetical program run time as function of #processors for several parallel fractions f. Note the log-log plot TM
  • 6. TM Fine-Grained Vs Coarse-Grained • Fine-grain parallelism (typically loop level) – – – – can be done incrementally, one loop at a time does not require deep knowledge of the code a lot of loops have to be parallel for decent speedup potentially many synchronization points MAIN (at the end of each parallel loop) A E B F C G K • Coarse-grain parallelism – make larger loops parallel at higher call-tree level potentially in-closing many small loops – more code is parallel at once – fewer synchronization points, reducing overhead – requires deeper knowledge of the code H L p Coarse-grained D I J N M O q r s t Fine-grained
  • 7. TM Other Impediments to Scalability Load imbalance: p0 p1 p2 p3 • the time to complete a parallel execution of a code segment is start determined by the longest running thread Elapsed time finish • unequal work load distribution leads to some processors being idle, while others work too much with coarse grain parallelization, more opportunities for load imbalance exist Too many synchronization points • compiler will put synchronization points at the start and exit of each parallel region
  • 8. Computing π with DPL π= 1 4 dx (1+x2) 0 =Σ 0<i<N 4 N(1+((i+0.5)/N)2) PROGRAM PIPROG INTEGER, PARAMETER:: N = 1000000 REAL (KIND=8):: LS,PI, W = 1.0/N PI = SUM( (/ (4.0*W/(1.0+((I+0.5)*W)**2),I=1,N) /) ) PRINT *, PI END Notes: – – – – – essentially sequential form automatic detection of parallelism automatic work sharing all variables shared by default number of processors specified outside of the code compile with: TM
  • 9. Computing π with Shared Memory π= 1 4 dx (1+x2) 0 =Σ 0<i<N 4 N(1+((i+0.5)/N)2) #define n 1000000 main() { double pi, l, ls = 0.0, w = 1.0/n; int i; #pragma omp parallel for private(i,l) reduction(+:ls) for(i=0; i<n; i++) { l = (i+0.5)*w; ls += 4.0/(1.0+l*l); } printf(“pi is %fn”,ls*w); } Notes: – essentially sequential form – automatic work sharing TM
  • 10. Computing π with Message Passing 1 #include <mpi.h> 4 dx #define N 1000000 (1+x2) main() 0 { double pi, l, ls = 0.0, w = 1.0/N; π= =Σ 0<i<N 4 N(1+((i+0.5)/N)2) int i, mid, nth; MPI_init(&argc, &argv); MPI_comm_rank(MPI_COMM_WORLD,&mid); MPI_comm_size(MPI_COMM_WORLD,&nth); } for(i=mid; i<N; i += nth) { l = (i+0.5)*w; ls += 4.0/(1.0+l*l); } MPI_reduce(&ls,&pi,1,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD); if(mid == 0) printf(“pi is %fn”,pi*w); MPI_finalize(); Notes: TM
  • 11. Comparing Parallel Paradigms • Automatic parallelization combined with explicit Shared Variable programming (compiler directives) used on machines with global memory – Symmetric Multi-Processors, CC-NUMA, PVP – These methods collectively known as Shared Memory Programming (SMP) – SMP programming model works at loop level, and coarse level parallelism: • the coarse level parallelism has to be specified explicitly • loop level parallelism can be found by the compiler (implicitly) – Explicit Message Passing Methods are necessary with machines that have no global memory addressability: • clusters of all sort, NOW & COW – Message Passing Methods require coarse level parallelism to be scalable •Choosing programming model is largely a matter of the application, personal preference and the target machine. •it has nothing to do with scalability. limitations: – communication overhead – process synchronization Scalability function of •scalability is mainly aparallelism the hardware and (your) implementation of the TM
  • 12. Summary TM • The serial part or the communication overhead of the code limits the scalability of the code (Amdahl Law) • programs have to be >99% parallel to use large (>30 proc) machines • several Programming Models are in use today: – Shared Memory programming (SMP) (with Automatic Compiler parallelization, Data-Parallel and explicit Shared Memory models) – Message Passing model • Choosing a Programming Model is largely a matter of the application, personal choice and target machine. It has nothing to do with scalability. – Don’t confuse Algorithm and implementation • machines with a global address space can run applications based on both, SMP and Message Passing programming models