SlideShare a Scribd company logo
1 of 25
Advanced Computer Architecture
Program Partitioning and Scheduling
Program partitioning
The transformation of sequentially coded program
into a parallel executable form can b done
manually by the programmer using explicit
parallelism or by a compiler detecting implicit
parallelism automatically.
Program partitioning determines whether the given
program can be partitioned or split into pieces that
can be executed in parallel or follow a certain
prespecified order of execution.
Program Partitioning & Scheduling
The size of the parts or pieces of a program that
can be considered for parallel execution can vary.
The sizes are roughly classified using the term
“granule size,” or simply “granularity.”
The simplest measure, for example, is the number
of instructions in a program part.
Grain sizes are usually described as fine, medium
or coarse, depending on the level of parallelism
involved.
Latency
Latency is the time required for communication
between different subsystems in a computer.
Memory latency, for example, is the time required
by a processor to access memory.
Synchronization latency is the time required for two
processes to synchronize their execution.
Communication latency is the interprocessor
communication.
Levels of Parallelism
Jobs or programs
Instructions
or statements
Non-recursive loops
or unfolded iterations
Procedures, subroutines,
tasks, or coroutines
Subprograms, job steps or
related parts of a program
}
}
Coarse grain
Medium grain
}Fine grain
Increasing
communication
demand and
scheduling
overhead
Higher degree
of parallelism
Instruction Level Parallelism
This fine-grained, or smallest granularity level
typically involves less than 20 instructions per
grain. The number of candidates for parallel
execution varies from 2 to thousands, with about
five instructions or statements (on the average)
being the average level of parallelism.
Advantages: The exploitation of fine-grain parallelism can
be assisted by an optimizing compiler which should be able
to automatically detect parallelism and translate the source
code to a parallel form which can be recognized by the run-
time system.
Loop-level Parallelism
Typical loop has less than 500 instructions.
If a loop operation is independent between
iterations, it can be handled by a pipeline, or by a
SIMD machine.
Loop Level Parallelism is the most optimized
program construct to execute on a parallel or
vector machine
Some loops (e.g. recursive) are difficult to handle.
Loop-level parallelism is still considered fine grain
computation.
Procedure-level Parallelism
Medium-sized grain; usually less than 2000
instructions.
Detection of parallelism is more difficult than with
smaller grains;
Interprocedural dependence analysis is difficult
and history-sensitive.
Communication requirement less than instruction-
level
SPMD (single procedure multiple data)execution
mode is a special case at this level.
Multitasking belongs to this level.
Subprogram-level Parallelism
Job step level; grain typically has thousands of
instructions; medium- or coarse-grain level.
Job steps can overlap across different jobs.
Multiprogramming conducted at this level.
In the past ,parallelism at this level has been
exploited by algorithm or programmers, rather then
by compiler.
We do not have good compiler available to exploit
medium- or coarse-grain parallelism at present.
Job or Program-Level Parallelism
Corresponds to execution of essentially
independent jobs or programs on a parallel
computer.
The grain size can be large as tens of thousand of
instruction in a single program.
For supercomputers with a small number of very
powerful processors, such coarse-grain parallelism
is practical.
Time sharing or space-sharing multiprocessor
explore this level of parallelism.
Job level parallelism is handled by program loader
or by the operating system.
Summary
Fine-grain exploited at instruction or loop levels, assisted by
the compiler.
Medium-grain (task or job step) requires programmer and
compiler support.
Coarse-grain relies heavily on effective OS support.
Shared-variable communication used at fine- and medium-
grain levels.
Message passing can be used for medium- and coarse-
grain communication, but fine-grain really need better
technique because of heavier communication requirements.
Communication Latency
Balancing granularity and latency can yield better
performance.
Various latencies attributed to machine
architecture, technology, and communication
patterns used.
Latency imposes a limiting factor on machine
scalability. Ex. Memory latency increases as
memory capacity increases, limiting the amount of
memory that can be used with a given tolerance for
communication latency.
Interprocessor Communication
Latency
Needs to be minimized by system designer
Affected by signal delays and communication
patterns
Ex. n communicating tasks may require n (n - 1)/2
communication links, and the complexity grows
quadratically, effectively limiting the number of
processors in the system.
Communication Patterns
Determined by algorithms used and architectural
support provided
Patterns include
permutations
broadcast
multicast
conference
Tradeoffs often exist between granularity of
parallelism and communication demand.
Grain Packing and Scheduling
Two questions:
How can I partition a program into parallel “pieces” to
yield the shortest execution time?
What is the optimal size of parallel grains?
There is an obvious tradeoff between the time
spent scheduling and synchronizing parallel grains
and the speedup obtained by parallel execution.
One approach to the problem is called “grain
packing.”
EENG-630 - Chapter 2
Grain packing and scheduling
The grain size problem requires determination of
both the number of partitions and the size of grains
in a parallel problem
Solution is problem dependent and machine
dependent
Want a short schedule for fast execution of
subdivided program modules
EENG-630 - Chapter 2
Grain determination and
scheduling optimization
Step 1: Construct a fine-grain program graph
Step 2: Schedule the fine-grain computation
Step 3: Grain packing to produce coarse grains
Step 4: Generate a parallel schedule based on
the packed graph
Program Graphs and Packing
A program graph is similar to a dependence graph
Nodes = { (n,s) }, where n = node name, s = size (larger
s = larger grain size).
Edges = { (v,d) }, where v = variable being
“communicated,” and d = communication delay.
Packing two (or more) nodes produces a node with
a larger grain size and possibly more edges to
other nodes.
Packing is done to eliminate unnecessary
communication delays or reduce overall scheduling
overhead.
The basic concept of program partitioning is introduced here . In Fig.
we show an example program graph in two different grain sizes.
A program graph shows the structure of a program.
It is very similar to the dependence graph .
Each node in the program graph corresponds to a computational unit
in the program.
The grain size is measured by the number of basic machine
cycle[including both processor and memory cycles] needed to
execute all the operations within the node.
We denote each node in Fig. by a pair (n,s) , where n is the node
name {id} and s is the grain size of the node.
Thus grain size reflects the number of computations involved in a
program segment.
Fine-grain nodes have a smaller grain size. and coarse-grain nodes
have a larger grain size.
The edge label (v,d) between two end nodes specifies the output
variable v from the source node or the input variable to the
destination node , and the communication delay d between them.
 This delay includes all the path delays and memory latency
involved.
There are 17 nodes in the fine-grain program graph (Fig. a) and 5
in the coarse-grain program graph (Fig. b].
The coarse-grain node is obtained by combining {grouping}
multiple fine-grain nodes.
 The fine grain corresponds to the following program:
Nodes l, 2, 3, 4, 5, and 6 are memory reference (data fetch}
operations.
Each takes one cycle to address and six cycles to fetch
from memory.
All remaining nodes (7 to 17) are CPU operations, each
requiring two cycles to complete. After packing, the
coarse-grain nodes have larger grain sizes ranging
from 4 to 8 as shown.
The node (A, 8) in Fig. (b) is obtained by combining the
nodes (1, 1), (2, 1), (3, 1), (4, I), (5, 1), (6, 1) and (11, 2) in
Fig.(a). The grain size, 8. of node A is the summation of all
grain sizes (1 + 1 + 1 + 1 + 1 +1 + 2 = 8) being combined.
The idea of grain packing is to apply fine grain first in order to achieve
a higher degree of parallelism. Then one combines (packs) multiple
fine-grain nodes into a coarse grain node if it can eliminate
unnecessary communications delays or reduce the overall scheduling
overhead.
Usually, all fine-grain operations within a single coarse-grain node are
assigned to the same processor for execution. Fine-grain partition of a
program often demands more inter processor communication than that
required in a coarse-grain partition. Thus grain packing offers a trade
off between parallelism and scheduling/communication overhead.
Internal delays among fine-grain operations within the same coarse-
grain node are negligible because the communication delay is
contributed mainly by inter processor delays rather than by delays
within the same processor.
The choice of the optimal grain size is meant to achieve the shortest
schedule for the nodes on a parallel computer system.
THANK YOU

More Related Content

What's hot

Parallel computing and its applications
Parallel computing and its applicationsParallel computing and its applications
Parallel computing and its applicationsBurhan Ahmed
 
Levels of Virtualization.docx
Levels of Virtualization.docxLevels of Virtualization.docx
Levels of Virtualization.docxkumari36
 
Dynamic interconnection networks
Dynamic interconnection networksDynamic interconnection networks
Dynamic interconnection networksPrasenjit Dey
 
Program and Network Properties
Program and Network PropertiesProgram and Network Properties
Program and Network PropertiesBeekrum Duwal
 
parallel language and compiler
parallel language and compilerparallel language and compiler
parallel language and compilerVignesh Tamil
 
Flynns classification
Flynns classificationFlynns classification
Flynns classificationYasir Khan
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processingPage Maker
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureBalaji Vignesh
 
Architecture of Mobile Computing
Architecture of Mobile ComputingArchitecture of Mobile Computing
Architecture of Mobile ComputingJAINIK PATEL
 
Load Balancing In Distributed Computing
Load Balancing In Distributed ComputingLoad Balancing In Distributed Computing
Load Balancing In Distributed ComputingRicha Singh
 
Parallel computing chapter 3
Parallel computing chapter 3Parallel computing chapter 3
Parallel computing chapter 3Md. Mahedi Mahfuj
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingSayed Chhattan Shah
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computingVajira Thambawita
 
Feng’s classification
Feng’s classificationFeng’s classification
Feng’s classificationNarayan Kandel
 
Leaky Bucket & Tocken Bucket - Traffic shaping
Leaky Bucket & Tocken Bucket - Traffic shapingLeaky Bucket & Tocken Bucket - Traffic shaping
Leaky Bucket & Tocken Bucket - Traffic shapingVimal Dewangan
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system modelHarshad Umredkar
 

What's hot (20)

Parallel computing and its applications
Parallel computing and its applicationsParallel computing and its applications
Parallel computing and its applications
 
Levels of Virtualization.docx
Levels of Virtualization.docxLevels of Virtualization.docx
Levels of Virtualization.docx
 
Dynamic interconnection networks
Dynamic interconnection networksDynamic interconnection networks
Dynamic interconnection networks
 
Program and Network Properties
Program and Network PropertiesProgram and Network Properties
Program and Network Properties
 
parallel language and compiler
parallel language and compilerparallel language and compiler
parallel language and compiler
 
Flynns classification
Flynns classificationFlynns classification
Flynns classification
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processing
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Pram model
Pram modelPram model
Pram model
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer Architecture
 
Architecture of Mobile Computing
Architecture of Mobile ComputingArchitecture of Mobile Computing
Architecture of Mobile Computing
 
Load Balancing In Distributed Computing
Load Balancing In Distributed ComputingLoad Balancing In Distributed Computing
Load Balancing In Distributed Computing
 
Parallel computing chapter 3
Parallel computing chapter 3Parallel computing chapter 3
Parallel computing chapter 3
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed Computing
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
Feng’s classification
Feng’s classificationFeng’s classification
Feng’s classification
 
Osi reference model
Osi reference modelOsi reference model
Osi reference model
 
Leaky Bucket & Tocken Bucket - Traffic shaping
Leaky Bucket & Tocken Bucket - Traffic shapingLeaky Bucket & Tocken Bucket - Traffic shaping
Leaky Bucket & Tocken Bucket - Traffic shaping
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system model
 

Similar to program partitioning and scheduling IN Advanced Computer Architecture

Hardback solution to accelerate multimedia computation through mgp in cmp
Hardback solution to accelerate multimedia computation through mgp in cmpHardback solution to accelerate multimedia computation through mgp in cmp
Hardback solution to accelerate multimedia computation through mgp in cmpeSAT Publishing House
 
01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.ppt01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.pptHarshitPal37
 
Mapreduce2008 cacm
Mapreduce2008 cacmMapreduce2008 cacm
Mapreduce2008 cacmlmphuong06
 
IRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET Journal
 
Complier design
Complier design Complier design
Complier design shreeuva
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGDYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGcscpconf
 
System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computingpurplesea
 
Parallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPParallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPIJSRED
 
Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...hrmalik20
 
Cloud computing system models for distributed and cloud computing
Cloud computing system models for distributed and cloud computingCloud computing system models for distributed and cloud computing
Cloud computing system models for distributed and cloud computinghrmalik20
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfKishaKiddo
 
LOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONSLOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONSijdpsjournal
 
Lock free parallel access collections
Lock free parallel access collectionsLock free parallel access collections
Lock free parallel access collectionsijdpsjournal
 
paradigms cloud.pptx
paradigms cloud.pptxparadigms cloud.pptx
paradigms cloud.pptxgunvinit931
 

Similar to program partitioning and scheduling IN Advanced Computer Architecture (20)

Compiler design
Compiler designCompiler design
Compiler design
 
Hardback solution to accelerate multimedia computation through mgp in cmp
Hardback solution to accelerate multimedia computation through mgp in cmpHardback solution to accelerate multimedia computation through mgp in cmp
Hardback solution to accelerate multimedia computation through mgp in cmp
 
01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.ppt01-MessagePassingFundamentals.ppt
01-MessagePassingFundamentals.ppt
 
Mapreduce2008 cacm
Mapreduce2008 cacmMapreduce2008 cacm
Mapreduce2008 cacm
 
IRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CL
 
Complier design
Complier design Complier design
Complier design
 
1844 1849
1844 18491844 1849
1844 1849
 
1844 1849
1844 18491844 1849
1844 1849
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGDYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
 
System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computing
 
Parallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPParallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MP
 
Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...
 
Cloud computing system models for distributed and cloud computing
Cloud computing system models for distributed and cloud computingCloud computing system models for distributed and cloud computing
Cloud computing system models for distributed and cloud computing
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdf
 
LOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONSLOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONS
 
Lock free parallel access collections
Lock free parallel access collectionsLock free parallel access collections
Lock free parallel access collections
 
Resisting skew accumulation
Resisting skew accumulationResisting skew accumulation
Resisting skew accumulation
 
paradigms cloud.pptx
paradigms cloud.pptxparadigms cloud.pptx
paradigms cloud.pptx
 
Pregel - Paper Review
Pregel - Paper ReviewPregel - Paper Review
Pregel - Paper Review
 

Recently uploaded

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Recently uploaded (20)

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 

program partitioning and scheduling IN Advanced Computer Architecture

  • 1. Advanced Computer Architecture Program Partitioning and Scheduling
  • 2. Program partitioning The transformation of sequentially coded program into a parallel executable form can b done manually by the programmer using explicit parallelism or by a compiler detecting implicit parallelism automatically. Program partitioning determines whether the given program can be partitioned or split into pieces that can be executed in parallel or follow a certain prespecified order of execution.
  • 3. Program Partitioning & Scheduling The size of the parts or pieces of a program that can be considered for parallel execution can vary. The sizes are roughly classified using the term “granule size,” or simply “granularity.” The simplest measure, for example, is the number of instructions in a program part. Grain sizes are usually described as fine, medium or coarse, depending on the level of parallelism involved.
  • 4. Latency Latency is the time required for communication between different subsystems in a computer. Memory latency, for example, is the time required by a processor to access memory. Synchronization latency is the time required for two processes to synchronize their execution. Communication latency is the interprocessor communication.
  • 5. Levels of Parallelism Jobs or programs Instructions or statements Non-recursive loops or unfolded iterations Procedures, subroutines, tasks, or coroutines Subprograms, job steps or related parts of a program } } Coarse grain Medium grain }Fine grain Increasing communication demand and scheduling overhead Higher degree of parallelism
  • 6. Instruction Level Parallelism This fine-grained, or smallest granularity level typically involves less than 20 instructions per grain. The number of candidates for parallel execution varies from 2 to thousands, with about five instructions or statements (on the average) being the average level of parallelism. Advantages: The exploitation of fine-grain parallelism can be assisted by an optimizing compiler which should be able to automatically detect parallelism and translate the source code to a parallel form which can be recognized by the run- time system.
  • 7. Loop-level Parallelism Typical loop has less than 500 instructions. If a loop operation is independent between iterations, it can be handled by a pipeline, or by a SIMD machine. Loop Level Parallelism is the most optimized program construct to execute on a parallel or vector machine Some loops (e.g. recursive) are difficult to handle. Loop-level parallelism is still considered fine grain computation.
  • 8. Procedure-level Parallelism Medium-sized grain; usually less than 2000 instructions. Detection of parallelism is more difficult than with smaller grains; Interprocedural dependence analysis is difficult and history-sensitive. Communication requirement less than instruction- level SPMD (single procedure multiple data)execution mode is a special case at this level. Multitasking belongs to this level.
  • 9. Subprogram-level Parallelism Job step level; grain typically has thousands of instructions; medium- or coarse-grain level. Job steps can overlap across different jobs. Multiprogramming conducted at this level. In the past ,parallelism at this level has been exploited by algorithm or programmers, rather then by compiler. We do not have good compiler available to exploit medium- or coarse-grain parallelism at present.
  • 10. Job or Program-Level Parallelism Corresponds to execution of essentially independent jobs or programs on a parallel computer. The grain size can be large as tens of thousand of instruction in a single program. For supercomputers with a small number of very powerful processors, such coarse-grain parallelism is practical. Time sharing or space-sharing multiprocessor explore this level of parallelism. Job level parallelism is handled by program loader or by the operating system.
  • 11. Summary Fine-grain exploited at instruction or loop levels, assisted by the compiler. Medium-grain (task or job step) requires programmer and compiler support. Coarse-grain relies heavily on effective OS support. Shared-variable communication used at fine- and medium- grain levels. Message passing can be used for medium- and coarse- grain communication, but fine-grain really need better technique because of heavier communication requirements.
  • 12. Communication Latency Balancing granularity and latency can yield better performance. Various latencies attributed to machine architecture, technology, and communication patterns used. Latency imposes a limiting factor on machine scalability. Ex. Memory latency increases as memory capacity increases, limiting the amount of memory that can be used with a given tolerance for communication latency.
  • 13. Interprocessor Communication Latency Needs to be minimized by system designer Affected by signal delays and communication patterns Ex. n communicating tasks may require n (n - 1)/2 communication links, and the complexity grows quadratically, effectively limiting the number of processors in the system.
  • 14. Communication Patterns Determined by algorithms used and architectural support provided Patterns include permutations broadcast multicast conference Tradeoffs often exist between granularity of parallelism and communication demand.
  • 15. Grain Packing and Scheduling Two questions: How can I partition a program into parallel “pieces” to yield the shortest execution time? What is the optimal size of parallel grains? There is an obvious tradeoff between the time spent scheduling and synchronizing parallel grains and the speedup obtained by parallel execution. One approach to the problem is called “grain packing.”
  • 16. EENG-630 - Chapter 2 Grain packing and scheduling The grain size problem requires determination of both the number of partitions and the size of grains in a parallel problem Solution is problem dependent and machine dependent Want a short schedule for fast execution of subdivided program modules
  • 17. EENG-630 - Chapter 2 Grain determination and scheduling optimization Step 1: Construct a fine-grain program graph Step 2: Schedule the fine-grain computation Step 3: Grain packing to produce coarse grains Step 4: Generate a parallel schedule based on the packed graph
  • 18. Program Graphs and Packing A program graph is similar to a dependence graph Nodes = { (n,s) }, where n = node name, s = size (larger s = larger grain size). Edges = { (v,d) }, where v = variable being “communicated,” and d = communication delay. Packing two (or more) nodes produces a node with a larger grain size and possibly more edges to other nodes. Packing is done to eliminate unnecessary communication delays or reduce overall scheduling overhead.
  • 19.
  • 20. The basic concept of program partitioning is introduced here . In Fig. we show an example program graph in two different grain sizes. A program graph shows the structure of a program. It is very similar to the dependence graph . Each node in the program graph corresponds to a computational unit in the program. The grain size is measured by the number of basic machine cycle[including both processor and memory cycles] needed to execute all the operations within the node. We denote each node in Fig. by a pair (n,s) , where n is the node name {id} and s is the grain size of the node. Thus grain size reflects the number of computations involved in a program segment. Fine-grain nodes have a smaller grain size. and coarse-grain nodes have a larger grain size.
  • 21. The edge label (v,d) between two end nodes specifies the output variable v from the source node or the input variable to the destination node , and the communication delay d between them.  This delay includes all the path delays and memory latency involved. There are 17 nodes in the fine-grain program graph (Fig. a) and 5 in the coarse-grain program graph (Fig. b]. The coarse-grain node is obtained by combining {grouping} multiple fine-grain nodes.  The fine grain corresponds to the following program:
  • 22.
  • 23. Nodes l, 2, 3, 4, 5, and 6 are memory reference (data fetch} operations. Each takes one cycle to address and six cycles to fetch from memory. All remaining nodes (7 to 17) are CPU operations, each requiring two cycles to complete. After packing, the coarse-grain nodes have larger grain sizes ranging from 4 to 8 as shown. The node (A, 8) in Fig. (b) is obtained by combining the nodes (1, 1), (2, 1), (3, 1), (4, I), (5, 1), (6, 1) and (11, 2) in Fig.(a). The grain size, 8. of node A is the summation of all grain sizes (1 + 1 + 1 + 1 + 1 +1 + 2 = 8) being combined.
  • 24. The idea of grain packing is to apply fine grain first in order to achieve a higher degree of parallelism. Then one combines (packs) multiple fine-grain nodes into a coarse grain node if it can eliminate unnecessary communications delays or reduce the overall scheduling overhead. Usually, all fine-grain operations within a single coarse-grain node are assigned to the same processor for execution. Fine-grain partition of a program often demands more inter processor communication than that required in a coarse-grain partition. Thus grain packing offers a trade off between parallelism and scheduling/communication overhead. Internal delays among fine-grain operations within the same coarse- grain node are negligible because the communication delay is contributed mainly by inter processor delays rather than by delays within the same processor. The choice of the optimal grain size is meant to achieve the shortest schedule for the nodes on a parallel computer system.