SlideShare una empresa de Scribd logo
1 de 37
PERFORMANCE
EVALUATION OF
PARALLEL COMPUTERS

CSE539
(Advanced Computer Architecture)

Sumit Mittu
(Assistant Professor, CSE/IT)
Lovely Professional University
BASICS OF
PERFORMANCE EVALUATION

A sequential algorithm is evaluated in terms of its execution time which is
expressed as a function of its input size.
For a parallel algorithm, the execution time depends not only on input size but also
on factors such as parallel architecture, no. of processors, etc.
Performance Metrics
Parallel Run Time
Speedup

Efficiency

Standard Performance Measures
Peak Performance
Sustained Performance
Instruction Execution Rate (in MIPS)
Floating Point Capability (in MFLOPS)
2
PERFORMANCE METRICS

Parallel Runtime
The parallel run time T(n) of a program or application is the time required to run
the program on an n-processor parallel computer.
When n = 1, T(1) denotes sequential runtime of the program on single processor.
Speedup
Speedup S(n) is defined as the ratio of time taken to run a program on a single
processor to the time taken to run the program on a parallel computer with
identical processors

It measures how faster the program runs on a parallel computer rather than on a
single processor.
3
PERFORMANCE METRICS

Efficiency
The Efficiency E(n) of a program on n processors is defined as the ratio of
speedup achieved and the number of processor used to achieve it.

Relationship between Execution Time, Speedup and Efficiency and the
number of processors used is depicted using the graphs in next slides.

In the ideal case:

Speedup is expected to be linear i.e. it grows linearly with the number of
processors, but in most cases it falls due to parallel overhead.
4
PERFORMANCE METRICS

Graphs showing relationship b/w T(n) and no. of processors

<<<IMAGES>>>

5
PERFORMANCE METRICS

Graphs showing relationship b/w S(n) and no. of processors

<<<IMAGES>>>

6
PERFORMANCE METRICS

Graphs showing relationship b/w E(n) and no. of processors

<<<IMAGES>>>

7
PERFORMANCE MEASURES

Standard Performance Measures
Most of the standard measures adopted by the industry to compare the
performance of various parallel computers are based on the concepts of:
Peak Performance
[Theoretical maximum based on best possible utilization of all resources]
Sustained Performance
[based on running application-oriented benchmarks]

Generally measured in units of:
MIPS [to reflect instruction execution rate]
MFLOPS [to reflect the floating-point capability]
8
PERFORMANCE MEASURES

Benchmarks
Benchmarks are a set of programs of program fragments used to compare the
performance of various machines.
Machines are exposed to these benchmark tests and tested for performance.
When it is not possible to test the applications of different machines, then the
results of benchmark programs that most resemble the applications run on those
machines are used to evaluate the performance of machine.

9
PERFORMANCE MEASURES

Benchmarks
Kernel Benchmarks
[Program fragments which are extracted from real programs]
[Heavily used core and responsible for most execution time]
Synthetic Benchmarks
[Small programs created especially for benchmarking purposes]
[These benchmarks do not perform any useful computation]
EXAMPLES
LINPACK
LAPACK
Livermore Loops SPECmarks
NAS Parallel Benchmarks Perfect Club Parallel Benchmarks
10
PARALLEL OVERHEAD

Sources of Parallel Overhead
Parallel computers in practice do not achieve linear speedup or an efficiency of 1
because of parallel overhead. The major sources of which could be:

•Inter-processor Communication
•Load Imbalance
•Inter-Task Dependency
•Extra Computation
•Parallel Balance Point
11
SPEEDUP
PERFORMANCE LAWS

Speedup Performance Laws

Amdahl’s Law
[based on fixed problem size or fixed work load]
Gustafson’s Law
[for scaled problems, where problem size increases with machine size
i.e. the number of processors]
Sun & Ni’s Law
[applied to scaled problems bounded by memory capacity]
12
SPEEDUP
PERFORMANCE LAWS

Amdahl’s Law (1967)
For a given problem size, the speedup does not increase linearly as the number of
processors increases. In fact, the speedup tends to become saturated.
This is a consequence of Amdahl’s Law.
According to Amdahl’s Law, a program contains two types of operations:
Completely sequential
Completely parallel
Let, the time Ts taken to perform sequential operations be a fraction α (0<α≤1) of
the total execution time T(1) of the program, then the time Tp to perform parallel
operations shall be (1-α) of T(1)
13
SPEEDUP
PERFORMANCE LAWS

Amdahl’s Law
Thus, Ts = α.T(1) and Tp = (1-α).T(1)
Assuming that the parallel operations achieve linear speedup
(i.e. these operations use 1/n of the time taken to perform on each processor), then
T(n) = Ts + Tp/n =

Thus, the speedup with n processors will be:

14
SPEEDUP
PERFORMANCE LAWS

Amdahl’s Law
Sequential operations will tend to dominate the speedup as n becomes very large.
As n  ∞, S(n)  1/α

This means, no matter how many processors are employed, the speedup in
this problem is limited to 1/ α.
This is known as sequential bottleneck of the problem.
Note: Sequential bottleneck cannot be removed just by increasing the no. of processors.
15
SPEEDUP
PERFORMANCE LAWS

Amdahl’s Law
A major shortcoming in applying the Amdahl’s Law: (is its own characteristic)
The total work load or the problem size is fixed
Thus, execution time decreases with increasing no. of processors
Thus, a successful method of overcoming this shortcoming is to increase the problem size!

16
SPEEDUP
PERFORMANCE LAWS

Amdahl’s Law

<<<GRAPH>>>

17
SPEEDUP
PERFORMANCE LAWS

Gustafson’s Law (1988)
It relaxed the restriction of fixed size of the problem and used the notion of fixed
execution time for getting over the sequential bottleneck.
According to Gustafson’s Law,
If the number of parallel operations in the problem is increased (or scaled up) sufficiently,
Then sequential operations will no longer be a bottleneck.

In accuracy-critical applications, it is desirable to solve the largest problem size on a
larger machine rather than solving a smaller problem on a smaller machine, with
almost the same execution time.
18
SPEEDUP
PERFORMANCE LAWS

Gustafson’s Law
As the machine size increases, the work load (or problem size) is also increased so
as to keep the fixed execution time for the problem.
Let, Ts be the constant time tank to perform sequential operations; and
Tp(n,W) be the time taken to perform parallel operation of problem size or
workload W using n processors;
Then the speedup with n processors is:

19
SPEEDUP
PERFORMANCE LAWS

Gustafson’s Law

<<<IMAGES>>>

20
SPEEDUP
PERFORMANCE LAWS

Gustafson’s Law
Assuming that parallel operations achieve a linear speedup
(i.e. these operations take 1/n of the time to perform on one processor)
Then, Tp(1,W) = n. Tp(n,W)
Let α be the fraction of sequential work load in problem, i.e.

Then the speedup can be expressed as : with n processors is:

21
SPEEDUP
PERFORMANCE LAWS

Sun & Ni’s Law (1993)
This law defines a memory bounded speedup model which generalizes both Amdahl’s Law
and Gustafson’s Law to maximize the use of both processor and memory capacities.

The idea is to solve maximum possible size of problem, limited by memory capacity
This inherently demands an increased or scaled work load,
providing higher speedup,
Higher efficiency, and
Better resource (processor & memory) utilization

But may result in slight increase in execution time to achieve this scalable
speedup performance!
22
SPEEDUP
PERFORMANCE LAWS

Sun & Ni’s Law
According to this law, the speedup S*(n) in the performance can be defined by:

Assumptions made while deriving the above expression:
•A global address space is formed from all individual memory spaces i.e. there is a
distributed shared memory space
•All available memory capacity of used up for solving the scaled problem.
23
SPEEDUP
PERFORMANCE LAWS

Sun & Ni’s Law
Special Cases:
•G(n) = 1
Corresponds to where we have fixed problem size i.e. Amdahl’s Law
•G(n) = n
Corresponds to where the work load increases n times when memory is increased n
times, i.e. for scaled problem or Gustafson’s Law
•G(n) ≥ n
Corresponds to where computational workload (time) increases faster than memory
requirement.
Comparing speedup factors S*(n), S’(n) and S’(n), we shall find S*(n) ≥ S’(n) ≥ S(n)
24
SPEEDUP
PERFORMANCE LAWS

Sun & Ni’s Law

<<<IMAGES>>>

25
SCALABILITY METRIC

Scalability
– Increasing the no. of processors decreases the efficiency!
+ Increasing the amount of computation per processor, increases the efficiency!
To keep the efficiency fixed, both the size of problem and the no. of processors must be increased
simultaneously.
A parallel computing system is said to be scalable if its efficiency can be
fixed by simultaneously increasing the machine size and the problem size.

Scalability of a parallel system is the measure of its capacity to increase speedup in proportion to
the machine size.

26
SCALABILITY METRIC

Isoefficiency Function
The isoefficiency function can be used to measure scalability of the parallel computing
systems.
It shows how the size of problem must grow as a function of the number of processors used in order to
maintain some constant efficiency.
The general form of the function is derived using an equivalent definition of efficiency
as follows:

Where, U is the time taken to do the useful computation (essential work), and
O is the parallel overhead. (Note: O is zero for sequential execution).
27
SCALABILITY METRIC

Isoefficiency Function
If the efficiency is fixed at some constant value K then

Where, K’ is a constant for fixed efficiency K.
This function is known as the isoefficiency function of parallel computing system.
A small isoefficiency function means that small increments in the problem size (U), are sufficient
for efficient utilization of an increasing no. of processors, indicating high scalability.
A large isoeffcicnecy function indicates a poorly scalable system.
28
SCALABILITY METRIC

Isoefficiency Function

29
PERFORMANCE
MEASUREMENT TOOLS

Performance Analysis
Search Based Tools
Visualization
Utilization Displays
[Processor (utilization) count, Utilization Summary, Gantt charts, Concurrency Profile, Kiviat
Diagrams]
Communication Displays
[Message Queues, Communication Matrix, Communication Traffic, Hypercube]
Task Displays
[Task Gantt, Task Summary]
30
PERFORMANCE
MEASUREMENT TOOLS

31
PERFORMANCE
MEASUREMENT TOOLS

32
PERFORMANCE
MEASUREMENT TOOLS

33
PERFORMANCE
MEASUREMENT TOOLS

34
PERFORMANCE
MEASUREMENT TOOLS

35
PERFORMANCE
MEASUREMENT TOOLS

Instrumentation
A way to collect data about an application is to instrument the application executable
so that when it executes, it generates the required information as a side-effect.
Ways to do instrumentation:
By inserting it into the application source code directly, or
By placing it into the runtime libraries, or
By modifying the linked executable, etc.
Doing this, some perturbation of the application program will occur
(i.e. intrusion problem)
36
PERFORMANCE
MEASUREMENT TOOLS

Instrumentation
Intrusion includes both:
Direct contention for resources (e.g. CPU, memory, communication links, etc.)
Secondary interference with resources (e.g. interaction with cache replacements or virtual
memory, etc.)
To address such effects, you may adopt the following approaches:
Realizing that intrusion affects measurement, treat the resulting data as an
approximation
Leave the added instrumentation in the final implementation.
Try to minimize the intrusion.
Quantify the intrusion and compensate for it!
37

Más contenido relacionado

La actualidad más candente

Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingVengada Karthik Rangaraju
 
Parallel Programing Model
Parallel Programing ModelParallel Programing Model
Parallel Programing ModelAdlin Jeena
 
Flynns classification
Flynns classificationFlynns classification
Flynns classificationYasir Khan
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computingVajira Thambawita
 
Processes and Processors in Distributed Systems
Processes and Processors in Distributed SystemsProcesses and Processors in Distributed Systems
Processes and Processors in Distributed SystemsDr Sandeep Kumar Poonia
 
program partitioning and scheduling IN Advanced Computer Architecture
program partitioning and scheduling  IN Advanced Computer Architectureprogram partitioning and scheduling  IN Advanced Computer Architecture
program partitioning and scheduling IN Advanced Computer ArchitecturePankaj Kumar Jain
 
Data Parallel and Object Oriented Model
Data Parallel and Object Oriented ModelData Parallel and Object Oriented Model
Data Parallel and Object Oriented ModelNikhil Sharma
 
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSINGADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING Zena Abo-Altaheen
 
Implementation levels of virtualization
Implementation levels of virtualizationImplementation levels of virtualization
Implementation levels of virtualizationGokulnath S
 

La actualidad más candente (20)

6.distributed shared memory
6.distributed shared memory6.distributed shared memory
6.distributed shared memory
 
Distributed Operating System_1
Distributed Operating System_1Distributed Operating System_1
Distributed Operating System_1
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel Programming
 
Parallel Computing
Parallel Computing Parallel Computing
Parallel Computing
 
Parallel Programing Model
Parallel Programing ModelParallel Programing Model
Parallel Programing Model
 
4. system models
4. system models4. system models
4. system models
 
Flynns classification
Flynns classificationFlynns classification
Flynns classification
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
Distributed shared memory ch 5
Distributed shared memory ch 5Distributed shared memory ch 5
Distributed shared memory ch 5
 
Processes and Processors in Distributed Systems
Processes and Processors in Distributed SystemsProcesses and Processors in Distributed Systems
Processes and Processors in Distributed Systems
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
program partitioning and scheduling IN Advanced Computer Architecture
program partitioning and scheduling  IN Advanced Computer Architectureprogram partitioning and scheduling  IN Advanced Computer Architecture
program partitioning and scheduling IN Advanced Computer Architecture
 
Data Parallel and Object Oriented Model
Data Parallel and Object Oriented ModelData Parallel and Object Oriented Model
Data Parallel and Object Oriented Model
 
Parallel programming model
Parallel programming modelParallel programming model
Parallel programming model
 
Advanced computer architecture
Advanced computer architectureAdvanced computer architecture
Advanced computer architecture
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSINGADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
 
Implementation levels of virtualization
Implementation levels of virtualizationImplementation levels of virtualization
Implementation levels of virtualization
 

Similar a Aca11 bk2 ch9

ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxjohnsmith96441
 
Measuring Performance by Irfanullah
Measuring Performance by IrfanullahMeasuring Performance by Irfanullah
Measuring Performance by Irfanullahguest2e9811e
 
Multicore and GPU Programming
Multicore and GPU ProgrammingMulticore and GPU Programming
Multicore and GPU ProgrammingRoland Bruggmann
 
3. Potential Benefits, Limits and Costs of Parallel Programming.pdf
3. Potential Benefits, Limits and Costs of Parallel Programming.pdf3. Potential Benefits, Limits and Costs of Parallel Programming.pdf
3. Potential Benefits, Limits and Costs of Parallel Programming.pdfMohamedAymen14
 
L07_performance and cost in advanced hardware- computer architecture.pptx
L07_performance and cost in advanced hardware- computer architecture.pptxL07_performance and cost in advanced hardware- computer architecture.pptx
L07_performance and cost in advanced hardware- computer architecture.pptxIsaac383415
 
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisShah Zaib
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)swapnac12
 
Parallel Computing - Lec 6
Parallel Computing - Lec 6Parallel Computing - Lec 6
Parallel Computing - Lec 6Shah Zaib
 
Computer architecture short note (version 8)
Computer architecture short note (version 8)Computer architecture short note (version 8)
Computer architecture short note (version 8)Nimmi Weeraddana
 
Unit i basic concepts of algorithms
Unit i basic concepts of algorithmsUnit i basic concepts of algorithms
Unit i basic concepts of algorithmssangeetha s
 
Fundamentals of the Analysis of Algorithm Efficiency
Fundamentals of the Analysis of Algorithm EfficiencyFundamentals of the Analysis of Algorithm Efficiency
Fundamentals of the Analysis of Algorithm EfficiencySaranya Natarajan
 
Performance measures
Performance measuresPerformance measures
Performance measuresDivya Tiwari
 
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGDYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGcscpconf
 

Similar a Aca11 bk2 ch9 (20)

ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
 
Chap5 slides
Chap5 slidesChap5 slides
Chap5 slides
 
Measuring Performance by Irfanullah
Measuring Performance by IrfanullahMeasuring Performance by Irfanullah
Measuring Performance by Irfanullah
 
Amdahl`s law -Processor performance
Amdahl`s law -Processor performanceAmdahl`s law -Processor performance
Amdahl`s law -Processor performance
 
Multicore and GPU Programming
Multicore and GPU ProgrammingMulticore and GPU Programming
Multicore and GPU Programming
 
3. Potential Benefits, Limits and Costs of Parallel Programming.pdf
3. Potential Benefits, Limits and Costs of Parallel Programming.pdf3. Potential Benefits, Limits and Costs of Parallel Programming.pdf
3. Potential Benefits, Limits and Costs of Parallel Programming.pdf
 
L07_performance and cost in advanced hardware- computer architecture.pptx
L07_performance and cost in advanced hardware- computer architecture.pptxL07_performance and cost in advanced hardware- computer architecture.pptx
L07_performance and cost in advanced hardware- computer architecture.pptx
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)
 
performance
performanceperformance
performance
 
Parallel Computing - Lec 6
Parallel Computing - Lec 6Parallel Computing - Lec 6
Parallel Computing - Lec 6
 
Chpt7
Chpt7Chpt7
Chpt7
 
Computer architecture short note (version 8)
Computer architecture short note (version 8)Computer architecture short note (version 8)
Computer architecture short note (version 8)
 
Unit i basic concepts of algorithms
Unit i basic concepts of algorithmsUnit i basic concepts of algorithms
Unit i basic concepts of algorithms
 
Fundamentals of the Analysis of Algorithm Efficiency
Fundamentals of the Analysis of Algorithm EfficiencyFundamentals of the Analysis of Algorithm Efficiency
Fundamentals of the Analysis of Algorithm Efficiency
 
Performance measures
Performance measuresPerformance measures
Performance measures
 
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGDYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
 
Matrix multiplication
Matrix multiplicationMatrix multiplication
Matrix multiplication
 

Más de Sumit Mittu (11)

Int306 03
Int306 03Int306 03
Int306 03
 
Int306 02
Int306 02Int306 02
Int306 02
 
Int306 01
Int306 01Int306 01
Int306 01
 
Int306 00
Int306 00Int306 00
Int306 00
 
Int306 04
Int306 04Int306 04
Int306 04
 
Aca2 10 11
Aca2 10 11Aca2 10 11
Aca2 10 11
 
Aca2 09 new
Aca2 09 newAca2 09 new
Aca2 09 new
 
Aca2 08 new
Aca2 08 newAca2 08 new
Aca2 08 new
 
Aca2 07 new
Aca2 07 newAca2 07 new
Aca2 07 new
 
Aca2 06 new
Aca2 06 newAca2 06 new
Aca2 06 new
 
Aca2 01 new
Aca2 01 newAca2 01 new
Aca2 01 new
 

Último

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 

Último (20)

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 

Aca11 bk2 ch9

  • 1. PERFORMANCE EVALUATION OF PARALLEL COMPUTERS CSE539 (Advanced Computer Architecture) Sumit Mittu (Assistant Professor, CSE/IT) Lovely Professional University
  • 2. BASICS OF PERFORMANCE EVALUATION A sequential algorithm is evaluated in terms of its execution time which is expressed as a function of its input size. For a parallel algorithm, the execution time depends not only on input size but also on factors such as parallel architecture, no. of processors, etc. Performance Metrics Parallel Run Time Speedup Efficiency Standard Performance Measures Peak Performance Sustained Performance Instruction Execution Rate (in MIPS) Floating Point Capability (in MFLOPS) 2
  • 3. PERFORMANCE METRICS Parallel Runtime The parallel run time T(n) of a program or application is the time required to run the program on an n-processor parallel computer. When n = 1, T(1) denotes sequential runtime of the program on single processor. Speedup Speedup S(n) is defined as the ratio of time taken to run a program on a single processor to the time taken to run the program on a parallel computer with identical processors It measures how faster the program runs on a parallel computer rather than on a single processor. 3
  • 4. PERFORMANCE METRICS Efficiency The Efficiency E(n) of a program on n processors is defined as the ratio of speedup achieved and the number of processor used to achieve it. Relationship between Execution Time, Speedup and Efficiency and the number of processors used is depicted using the graphs in next slides. In the ideal case: Speedup is expected to be linear i.e. it grows linearly with the number of processors, but in most cases it falls due to parallel overhead. 4
  • 5. PERFORMANCE METRICS Graphs showing relationship b/w T(n) and no. of processors <<<IMAGES>>> 5
  • 6. PERFORMANCE METRICS Graphs showing relationship b/w S(n) and no. of processors <<<IMAGES>>> 6
  • 7. PERFORMANCE METRICS Graphs showing relationship b/w E(n) and no. of processors <<<IMAGES>>> 7
  • 8. PERFORMANCE MEASURES Standard Performance Measures Most of the standard measures adopted by the industry to compare the performance of various parallel computers are based on the concepts of: Peak Performance [Theoretical maximum based on best possible utilization of all resources] Sustained Performance [based on running application-oriented benchmarks] Generally measured in units of: MIPS [to reflect instruction execution rate] MFLOPS [to reflect the floating-point capability] 8
  • 9. PERFORMANCE MEASURES Benchmarks Benchmarks are a set of programs of program fragments used to compare the performance of various machines. Machines are exposed to these benchmark tests and tested for performance. When it is not possible to test the applications of different machines, then the results of benchmark programs that most resemble the applications run on those machines are used to evaluate the performance of machine. 9
  • 10. PERFORMANCE MEASURES Benchmarks Kernel Benchmarks [Program fragments which are extracted from real programs] [Heavily used core and responsible for most execution time] Synthetic Benchmarks [Small programs created especially for benchmarking purposes] [These benchmarks do not perform any useful computation] EXAMPLES LINPACK LAPACK Livermore Loops SPECmarks NAS Parallel Benchmarks Perfect Club Parallel Benchmarks 10
  • 11. PARALLEL OVERHEAD Sources of Parallel Overhead Parallel computers in practice do not achieve linear speedup or an efficiency of 1 because of parallel overhead. The major sources of which could be: •Inter-processor Communication •Load Imbalance •Inter-Task Dependency •Extra Computation •Parallel Balance Point 11
  • 12. SPEEDUP PERFORMANCE LAWS Speedup Performance Laws Amdahl’s Law [based on fixed problem size or fixed work load] Gustafson’s Law [for scaled problems, where problem size increases with machine size i.e. the number of processors] Sun & Ni’s Law [applied to scaled problems bounded by memory capacity] 12
  • 13. SPEEDUP PERFORMANCE LAWS Amdahl’s Law (1967) For a given problem size, the speedup does not increase linearly as the number of processors increases. In fact, the speedup tends to become saturated. This is a consequence of Amdahl’s Law. According to Amdahl’s Law, a program contains two types of operations: Completely sequential Completely parallel Let, the time Ts taken to perform sequential operations be a fraction α (0<α≤1) of the total execution time T(1) of the program, then the time Tp to perform parallel operations shall be (1-α) of T(1) 13
  • 14. SPEEDUP PERFORMANCE LAWS Amdahl’s Law Thus, Ts = α.T(1) and Tp = (1-α).T(1) Assuming that the parallel operations achieve linear speedup (i.e. these operations use 1/n of the time taken to perform on each processor), then T(n) = Ts + Tp/n = Thus, the speedup with n processors will be: 14
  • 15. SPEEDUP PERFORMANCE LAWS Amdahl’s Law Sequential operations will tend to dominate the speedup as n becomes very large. As n  ∞, S(n)  1/α This means, no matter how many processors are employed, the speedup in this problem is limited to 1/ α. This is known as sequential bottleneck of the problem. Note: Sequential bottleneck cannot be removed just by increasing the no. of processors. 15
  • 16. SPEEDUP PERFORMANCE LAWS Amdahl’s Law A major shortcoming in applying the Amdahl’s Law: (is its own characteristic) The total work load or the problem size is fixed Thus, execution time decreases with increasing no. of processors Thus, a successful method of overcoming this shortcoming is to increase the problem size! 16
  • 18. SPEEDUP PERFORMANCE LAWS Gustafson’s Law (1988) It relaxed the restriction of fixed size of the problem and used the notion of fixed execution time for getting over the sequential bottleneck. According to Gustafson’s Law, If the number of parallel operations in the problem is increased (or scaled up) sufficiently, Then sequential operations will no longer be a bottleneck. In accuracy-critical applications, it is desirable to solve the largest problem size on a larger machine rather than solving a smaller problem on a smaller machine, with almost the same execution time. 18
  • 19. SPEEDUP PERFORMANCE LAWS Gustafson’s Law As the machine size increases, the work load (or problem size) is also increased so as to keep the fixed execution time for the problem. Let, Ts be the constant time tank to perform sequential operations; and Tp(n,W) be the time taken to perform parallel operation of problem size or workload W using n processors; Then the speedup with n processors is: 19
  • 21. SPEEDUP PERFORMANCE LAWS Gustafson’s Law Assuming that parallel operations achieve a linear speedup (i.e. these operations take 1/n of the time to perform on one processor) Then, Tp(1,W) = n. Tp(n,W) Let α be the fraction of sequential work load in problem, i.e. Then the speedup can be expressed as : with n processors is: 21
  • 22. SPEEDUP PERFORMANCE LAWS Sun & Ni’s Law (1993) This law defines a memory bounded speedup model which generalizes both Amdahl’s Law and Gustafson’s Law to maximize the use of both processor and memory capacities. The idea is to solve maximum possible size of problem, limited by memory capacity This inherently demands an increased or scaled work load, providing higher speedup, Higher efficiency, and Better resource (processor & memory) utilization But may result in slight increase in execution time to achieve this scalable speedup performance! 22
  • 23. SPEEDUP PERFORMANCE LAWS Sun & Ni’s Law According to this law, the speedup S*(n) in the performance can be defined by: Assumptions made while deriving the above expression: •A global address space is formed from all individual memory spaces i.e. there is a distributed shared memory space •All available memory capacity of used up for solving the scaled problem. 23
  • 24. SPEEDUP PERFORMANCE LAWS Sun & Ni’s Law Special Cases: •G(n) = 1 Corresponds to where we have fixed problem size i.e. Amdahl’s Law •G(n) = n Corresponds to where the work load increases n times when memory is increased n times, i.e. for scaled problem or Gustafson’s Law •G(n) ≥ n Corresponds to where computational workload (time) increases faster than memory requirement. Comparing speedup factors S*(n), S’(n) and S’(n), we shall find S*(n) ≥ S’(n) ≥ S(n) 24
  • 25. SPEEDUP PERFORMANCE LAWS Sun & Ni’s Law <<<IMAGES>>> 25
  • 26. SCALABILITY METRIC Scalability – Increasing the no. of processors decreases the efficiency! + Increasing the amount of computation per processor, increases the efficiency! To keep the efficiency fixed, both the size of problem and the no. of processors must be increased simultaneously. A parallel computing system is said to be scalable if its efficiency can be fixed by simultaneously increasing the machine size and the problem size. Scalability of a parallel system is the measure of its capacity to increase speedup in proportion to the machine size. 26
  • 27. SCALABILITY METRIC Isoefficiency Function The isoefficiency function can be used to measure scalability of the parallel computing systems. It shows how the size of problem must grow as a function of the number of processors used in order to maintain some constant efficiency. The general form of the function is derived using an equivalent definition of efficiency as follows: Where, U is the time taken to do the useful computation (essential work), and O is the parallel overhead. (Note: O is zero for sequential execution). 27
  • 28. SCALABILITY METRIC Isoefficiency Function If the efficiency is fixed at some constant value K then Where, K’ is a constant for fixed efficiency K. This function is known as the isoefficiency function of parallel computing system. A small isoefficiency function means that small increments in the problem size (U), are sufficient for efficient utilization of an increasing no. of processors, indicating high scalability. A large isoeffcicnecy function indicates a poorly scalable system. 28
  • 30. PERFORMANCE MEASUREMENT TOOLS Performance Analysis Search Based Tools Visualization Utilization Displays [Processor (utilization) count, Utilization Summary, Gantt charts, Concurrency Profile, Kiviat Diagrams] Communication Displays [Message Queues, Communication Matrix, Communication Traffic, Hypercube] Task Displays [Task Gantt, Task Summary] 30
  • 36. PERFORMANCE MEASUREMENT TOOLS Instrumentation A way to collect data about an application is to instrument the application executable so that when it executes, it generates the required information as a side-effect. Ways to do instrumentation: By inserting it into the application source code directly, or By placing it into the runtime libraries, or By modifying the linked executable, etc. Doing this, some perturbation of the application program will occur (i.e. intrusion problem) 36
  • 37. PERFORMANCE MEASUREMENT TOOLS Instrumentation Intrusion includes both: Direct contention for resources (e.g. CPU, memory, communication links, etc.) Secondary interference with resources (e.g. interaction with cache replacements or virtual memory, etc.) To address such effects, you may adopt the following approaches: Realizing that intrusion affects measurement, treat the resulting data as an approximation Leave the added instrumentation in the final implementation. Try to minimize the intrusion. Quantify the intrusion and compensate for it! 37