SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
High Performance
Computing
Introduction to Parallel Computing
Acknowledgements
Content of the following presentation is
borrowed from
The Lawrence Livermore National
Laboratory
https://hpc.llnl.gov/training/tutorials
Serial Computing
• Traditionally, software has been
written for serial computation:
• A problem is broken into a
discrete series of instructions
• Instructions are executed
sequentially one after another
• Executed on a single processor
• Only one instruction may execute
at any moment in time
Parallel Computing
▪ Simultaneous use of multiple
compute resources to solve a
computational problem.
▪ Run on multiple CPUs
▪ Problem is decomposed into
multiple parts that can be solved
concurrently.
▪ Each part is decomposed into a
set of instructions.
▪ Instructions are executed
simultaneously on different CPUs
Parallel Computer Architecture
Example: Networks connect multiple stand-
alone computers (nodes) to make larger parallel
computer clusters.
Compute Resources
• Single Computer with
multiple processors.
• Arbitrary Number of
Computers connected
by a network.
• Combination of both
Parallel Computer Memory Architectures
Shared Memory
Interconnect
Sharing the same address space
Parallel Computer Memory Architectures
Shared Memory
Advantages:
• Global address space provides a user-friendly programming perspective to memory
• Data sharing between tasks is both fast and uniform due to the proximity of
memory to CPUs
Disadvantages:
• Lack of scalability between memory and CPUs.
• Programmer responsibility for synchronization constructs that ensure "correct"
access of global memory.
Parallel Computer Memory Architectures
Distributed Memory
A communication network to connect inter-processor
memory.
Processors have their own local memory. Memory addresses
in one processor do not map to another processor, so there
is no concept of global address space across all processors.
Changes it makes to its local memory have no effect on the
memory of other processors.
Task of the programmer to explicitly define how and when
data is communicated. Synchronization between tasks is
likewise the programmer's responsibility.
The network "fabric" used for data transfer varies widely,
Memory
Memory
Memory
Memory
CPU
CPU
CPU
CPU
Parallel Computer Memory Architectures
Distributed Memory
Advantages:
• Scalable Memory: is scalable with the number of processors.
• Each processor can rapidly access its own memory.
• Cost effective: can use commodity, off-the-shelf processors and networking.
Disadvantages:
• Programmer is responsible for details associated with data communication
between processors.
• It may be difficult to map existing data structures, based on global memory, to
this memory organization.
• Non-uniform memory access times - data residing on a remote node takes longer
to access than node local data.
Parallel Computer Memory Architectures
Hybrid Memory
The largest and fastest computers in the world today
employ both shared and distributed memory architectures
The shared memory component can be a shared memory
machine and/or graphics processing units (GPU).
The distributed memory component is the networking of
multiple shared memory/GPU machines, which know only
about their own memory - not the memory on another
machine.
Network communications are required to move data from
one machine to another.
Trends indicate that this type of memory architecture will
prevail and increase at the high end of computing for the
foreseeable future,
Memory
Memory
Memory
Memory
CPU
CPU
CPU
CPU
Memory
Memory
Memory
Memory
CPU
CPU
CPU
CPU
The Parallel Computing Terminology
Supercomputing / High Performance Computing (HPC) : Using the
world's fastest and largest computers to solve large problems.
Node : a standalone "computer in a box". Usually comprised of multiple
CPUs/processors/cores, memory, network interfaces, etc. Nodes are
networked together to comprise a supercomputer.
CPU / Processor / Core : It Depends..... (A Node has multiple Cores or
Processors)
LogicalView of a Supercomputer
Limits and Costs of Parallel Programming
Observed speedup of a code which has
been parallelized, defined as:
wall-clock time of serial execution
wall-clock time of parallel execution
Parallel programs contain
 Serial Section
 Parallel Section
Amdhal’s Law
Parallelizable Serial
Time = 5 units Time = 3 units
2 3 4 51
1
2
3
4 5
Speedup is limited
by the non-
parallelizable/ serial
portion of the work.
Gustafson's Law
Amdhal’s Law & Gustafson's Law
How quickly can we complete analysis
on a particular data set by increasing
Processor count?
Can we analyze more data in approx.
same amount of time by increasing
Processor count?
Amdhal’s Law / Strong Scaling Gustafson’s Law / Weak Scaling
Moving from Serial to Parallel Code
• Can the problem be parallelized?
• Calculation of the Fibonacci series (0,1,1,2,3,5,8,13,21,...)
by use of the formula:
• F(n) = F(n-1) + F(n-2)
• Identify the program's hotspots
• Identify bottlenecks in the program
• Identify Data Dependencies and Task Dependencies (inhibitors to
parallelism )
• Investigate other algorithms if possible & take advantage of
optimized third party parallel software.
Moving from Serial to Parallel Code
Problem to solve
Decomposition
Subproblems/Tasks
Parallel worker threads
Assignment
Identify & Resolve Data & Control
Dependencies
Mapping
Processor Cores
Execution on Parallel Machine
Data
Decomposition
Task
Decomposition
Orchestration
Specifying mechanism to divide work
up among processes. (Static or
Dynamic)
Moving from Serial to Parallel Code : Decomposition
Decomposition
Divide computation into smaller parts(tasks) that can be executed concurrently
There are two basic ways to partition computational work among parallel tasks
1. Domain/Data Decomposition
Data associated with the problem
is decomposed
More on Data Decomposition
For problems that operate on large amounts of data
Data is divided up between CPUs : Each CPU has its own chunk of
dataset to operate upon and then the results are collated.
Which data should we partition?
Input Data
Output Data
Intermediate Data
Ensure Load Balancing : Equal sized tasks not necessarily equal size
data sets.
• Static Load Balancing
• Dynamic Load Balancing
2. Functional/ Task Decomposition
The problem is decomposed according to the work that must be done.
Each task then performs a portion of the overall work .
Moving from Serial to Parallel Code: Decomposition
Decomposition : Example
Dense Matrix-Vector Multiplication
Computing y[i] only use ith row of A and b
Task = > computing y[i]
Task size is uniform
No dependence between tasks
All tasks need b
• Does the partition define (at least an order of magnitude) more tasks
than there are processors in your target computer? (Flexibility)
• Does the partition avoid redundant computation and storage
requirements? (Scalability)
• Are tasks of comparable size? (Load Balancing)
• Does the number of tasks scale with problem size? Increase in
problem size should increase the number of tasks rather than the
size of individual tasks.
• Identify alternative partitions? You can maximize flexibility in
subsequent design stages by considering alternatives now.
• Investigate both domain and functional decompositions.
Decomposition : What to look for
• The order of statement execution affects the results of the
program.
• Multiple use of the same location(s) in storage by different tasks.
Data Dependencies
DO J = MYSTART,MYEND
A(J) = A(J-1) * 2.0
END DO
Task 1 Task2
------ ------
X = 2 X = 4
. . . .
…………
Y = X**2 Y = X**3
Loop carried dependence Loop independent data dependency
DO J = MYSTART,MYEND
A(J) = A(J-1) * 2.0
END DO
Task 1 Task2
------ ------
X = 2 X = 4
. . . . …………
Y = X**2 Y = X**3
If Task 2 has A(J) and task 1 has A(J-1)
Distributed memory architecture - task 2 must
obtain the value of A(J-1) from task 1 after task
1 finishes its computation
Shared memory architecture - task 2 must read
A(J-1) after task 1 updates it
(Race Condition)The value ofY is dependent on:
Distributed memory architecture - if or when the
value of X is communicated between the tasks.
Shared memory architecture - which task last
stores the value of X.
Data Dependencies
• Distributed memory architectures - communicate required data
at synchronization points.
• Shared memory architectures -synchronize read/write
operations between tasks.
• Data Dependencies:- Mutual Exclusion
Locks & Critical Sections
• Task Dependencies:- Explicit or Implicit Synchronization
points called Barriers
Handling Dependencies
GOAL : Assigning the tasks/ processes to Processors while
Minimizing Parallel Processing Overheads
• Maximize data locality
• Minimize volume of data-exchange
• Minimize frequency of interactions
• Minimize contention and hot spots
• Overlap computation with interactions
• Selective data and computation replication
Mapping
Thank You
HPC Solution Stack
Pthreads
Application Libraries (MPI/
OpenMP)
C
l
o
u
d
/I
n
h
o
u
s
e
Cluster Management Job Scheduling
Billing &
Reporting
Custom User Interface
Compute
Servers
Storage Interconnect
Management
Servers
Debuggers
Dev and Perf
Tools

Más contenido relacionado

La actualidad más candente

Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingSayed Chhattan Shah
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memoryAshish Kumar
 
File replication
File replicationFile replication
File replicationKlawal13
 
message passing vs shared memory
message passing vs shared memorymessage passing vs shared memory
message passing vs shared memoryHamza Zahid
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processingPage Maker
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel systemManish Singh
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed systemSunita Sahu
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bankpkaviya
 
Parallel computing and its applications
Parallel computing and its applicationsParallel computing and its applications
Parallel computing and its applicationsBurhan Ahmed
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computinghuda2018
 
Multiprocessor architecture
Multiprocessor architectureMultiprocessor architecture
Multiprocessor architectureArpan Baishya
 
Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1Marcirio Chaves
 
hierarchical bus system
 hierarchical bus system hierarchical bus system
hierarchical bus systemElvis Jonyo
 
Levels of Virtualization.docx
Levels of Virtualization.docxLevels of Virtualization.docx
Levels of Virtualization.docxkumari36
 

La actualidad más candente (20)

Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed Computing
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
 
File replication
File replicationFile replication
File replication
 
message passing vs shared memory
message passing vs shared memorymessage passing vs shared memory
message passing vs shared memory
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processing
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel system
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Cuda Architecture
Cuda ArchitectureCuda Architecture
Cuda Architecture
 
Amdahl`s law -Processor performance
Amdahl`s law -Processor performanceAmdahl`s law -Processor performance
Amdahl`s law -Processor performance
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed system
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bank
 
6.distributed shared memory
6.distributed shared memory6.distributed shared memory
6.distributed shared memory
 
Parallel computing and its applications
Parallel computing and its applicationsParallel computing and its applications
Parallel computing and its applications
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
Multiprocessor architecture
Multiprocessor architectureMultiprocessor architecture
Multiprocessor architecture
 
Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1
 
hierarchical bus system
 hierarchical bus system hierarchical bus system
hierarchical bus system
 
Draw and explain the architecture of general purpose microprocessor
Draw and explain the architecture of general purpose microprocessor Draw and explain the architecture of general purpose microprocessor
Draw and explain the architecture of general purpose microprocessor
 
Levels of Virtualization.docx
Levels of Virtualization.docxLevels of Virtualization.docx
Levels of Virtualization.docx
 

Similar a Introduction to Parallel Computing

CSA unit5.pptx
CSA unit5.pptxCSA unit5.pptx
CSA unit5.pptxAbcvDef
 
Chap 2 classification of parralel architecture and introduction to parllel p...
Chap 2  classification of parralel architecture and introduction to parllel p...Chap 2  classification of parralel architecture and introduction to parllel p...
Chap 2 classification of parralel architecture and introduction to parllel p...Malobe Lottin Cyrille Marcel
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rulesOleg Tsal-Tsalko
 
Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)Sudarshan Mondal
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxkrnaween
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesMurtadha Alsabbagh
 
SecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptSecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptRubenGabrielHernande
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureBalaji Vignesh
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfKishaKiddo
 
parallel computing.ppt
parallel computing.pptparallel computing.ppt
parallel computing.pptssuser413a98
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSKathirvel Ayyaswamy
 
System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computingpurplesea
 
Week # 1.pdf
Week # 1.pdfWeek # 1.pdf
Week # 1.pdfgiddy5
 

Similar a Introduction to Parallel Computing (20)

CSA unit5.pptx
CSA unit5.pptxCSA unit5.pptx
CSA unit5.pptx
 
Chap 2 classification of parralel architecture and introduction to parllel p...
Chap 2  classification of parralel architecture and introduction to parllel p...Chap 2  classification of parralel architecture and introduction to parllel p...
Chap 2 classification of parralel architecture and introduction to parllel p...
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
 
Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)
 
Lecture1
Lecture1Lecture1
Lecture1
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptx
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 
SecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptSecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.ppt
 
Distributive operating system
Distributive operating systemDistributive operating system
Distributive operating system
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer Architecture
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdf
 
parallel computing.ppt
parallel computing.pptparallel computing.ppt
parallel computing.ppt
 
Thread
ThreadThread
Thread
 
Underlying principles of parallel and distributed computing
Underlying principles of parallel and distributed computingUnderlying principles of parallel and distributed computing
Underlying principles of parallel and distributed computing
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computing
 
Unit-3.ppt
Unit-3.pptUnit-3.ppt
Unit-3.ppt
 
Operating System
Operating SystemOperating System
Operating System
 
Week # 1.pdf
Week # 1.pdfWeek # 1.pdf
Week # 1.pdf
 

Más de Akhila Prabhakaran

Más de Akhila Prabhakaran (9)

Re Imagining Education
Re Imagining EducationRe Imagining Education
Re Imagining Education
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)
 
Hypothesis testing Part1
Hypothesis testing Part1Hypothesis testing Part1
Hypothesis testing Part1
 
Statistical Analysis with R- III
Statistical Analysis with R- IIIStatistical Analysis with R- III
Statistical Analysis with R- III
 
Statistical Analysis with R -II
Statistical Analysis with R -IIStatistical Analysis with R -II
Statistical Analysis with R -II
 
Statistical Analysis with R -I
Statistical Analysis with R -IStatistical Analysis with R -I
Statistical Analysis with R -I
 
Introduction to MPI
Introduction to MPIIntroduction to MPI
Introduction to MPI
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 

Último

pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptRakeshMohan42
 

Último (20)

pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.ppt
 

Introduction to Parallel Computing

  • 2. Acknowledgements Content of the following presentation is borrowed from The Lawrence Livermore National Laboratory https://hpc.llnl.gov/training/tutorials
  • 3. Serial Computing • Traditionally, software has been written for serial computation: • A problem is broken into a discrete series of instructions • Instructions are executed sequentially one after another • Executed on a single processor • Only one instruction may execute at any moment in time
  • 4. Parallel Computing ▪ Simultaneous use of multiple compute resources to solve a computational problem. ▪ Run on multiple CPUs ▪ Problem is decomposed into multiple parts that can be solved concurrently. ▪ Each part is decomposed into a set of instructions. ▪ Instructions are executed simultaneously on different CPUs
  • 5. Parallel Computer Architecture Example: Networks connect multiple stand- alone computers (nodes) to make larger parallel computer clusters. Compute Resources • Single Computer with multiple processors. • Arbitrary Number of Computers connected by a network. • Combination of both
  • 6. Parallel Computer Memory Architectures Shared Memory Interconnect Sharing the same address space
  • 7. Parallel Computer Memory Architectures Shared Memory Advantages: • Global address space provides a user-friendly programming perspective to memory • Data sharing between tasks is both fast and uniform due to the proximity of memory to CPUs Disadvantages: • Lack of scalability between memory and CPUs. • Programmer responsibility for synchronization constructs that ensure "correct" access of global memory.
  • 8. Parallel Computer Memory Architectures Distributed Memory A communication network to connect inter-processor memory. Processors have their own local memory. Memory addresses in one processor do not map to another processor, so there is no concept of global address space across all processors. Changes it makes to its local memory have no effect on the memory of other processors. Task of the programmer to explicitly define how and when data is communicated. Synchronization between tasks is likewise the programmer's responsibility. The network "fabric" used for data transfer varies widely, Memory Memory Memory Memory CPU CPU CPU CPU
  • 9. Parallel Computer Memory Architectures Distributed Memory Advantages: • Scalable Memory: is scalable with the number of processors. • Each processor can rapidly access its own memory. • Cost effective: can use commodity, off-the-shelf processors and networking. Disadvantages: • Programmer is responsible for details associated with data communication between processors. • It may be difficult to map existing data structures, based on global memory, to this memory organization. • Non-uniform memory access times - data residing on a remote node takes longer to access than node local data.
  • 10. Parallel Computer Memory Architectures Hybrid Memory The largest and fastest computers in the world today employ both shared and distributed memory architectures The shared memory component can be a shared memory machine and/or graphics processing units (GPU). The distributed memory component is the networking of multiple shared memory/GPU machines, which know only about their own memory - not the memory on another machine. Network communications are required to move data from one machine to another. Trends indicate that this type of memory architecture will prevail and increase at the high end of computing for the foreseeable future, Memory Memory Memory Memory CPU CPU CPU CPU Memory Memory Memory Memory CPU CPU CPU CPU
  • 11. The Parallel Computing Terminology Supercomputing / High Performance Computing (HPC) : Using the world's fastest and largest computers to solve large problems. Node : a standalone "computer in a box". Usually comprised of multiple CPUs/processors/cores, memory, network interfaces, etc. Nodes are networked together to comprise a supercomputer. CPU / Processor / Core : It Depends..... (A Node has multiple Cores or Processors)
  • 12. LogicalView of a Supercomputer
  • 13. Limits and Costs of Parallel Programming Observed speedup of a code which has been parallelized, defined as: wall-clock time of serial execution wall-clock time of parallel execution Parallel programs contain  Serial Section  Parallel Section
  • 14. Amdhal’s Law Parallelizable Serial Time = 5 units Time = 3 units 2 3 4 51 1 2 3 4 5 Speedup is limited by the non- parallelizable/ serial portion of the work.
  • 16. Amdhal’s Law & Gustafson's Law How quickly can we complete analysis on a particular data set by increasing Processor count? Can we analyze more data in approx. same amount of time by increasing Processor count? Amdhal’s Law / Strong Scaling Gustafson’s Law / Weak Scaling
  • 17. Moving from Serial to Parallel Code • Can the problem be parallelized? • Calculation of the Fibonacci series (0,1,1,2,3,5,8,13,21,...) by use of the formula: • F(n) = F(n-1) + F(n-2) • Identify the program's hotspots • Identify bottlenecks in the program • Identify Data Dependencies and Task Dependencies (inhibitors to parallelism ) • Investigate other algorithms if possible & take advantage of optimized third party parallel software.
  • 18. Moving from Serial to Parallel Code Problem to solve Decomposition Subproblems/Tasks Parallel worker threads Assignment Identify & Resolve Data & Control Dependencies Mapping Processor Cores Execution on Parallel Machine Data Decomposition Task Decomposition Orchestration Specifying mechanism to divide work up among processes. (Static or Dynamic)
  • 19. Moving from Serial to Parallel Code : Decomposition Decomposition Divide computation into smaller parts(tasks) that can be executed concurrently There are two basic ways to partition computational work among parallel tasks 1. Domain/Data Decomposition Data associated with the problem is decomposed
  • 20. More on Data Decomposition For problems that operate on large amounts of data Data is divided up between CPUs : Each CPU has its own chunk of dataset to operate upon and then the results are collated. Which data should we partition? Input Data Output Data Intermediate Data Ensure Load Balancing : Equal sized tasks not necessarily equal size data sets. • Static Load Balancing • Dynamic Load Balancing
  • 21. 2. Functional/ Task Decomposition The problem is decomposed according to the work that must be done. Each task then performs a portion of the overall work . Moving from Serial to Parallel Code: Decomposition
  • 22. Decomposition : Example Dense Matrix-Vector Multiplication Computing y[i] only use ith row of A and b Task = > computing y[i] Task size is uniform No dependence between tasks All tasks need b
  • 23. • Does the partition define (at least an order of magnitude) more tasks than there are processors in your target computer? (Flexibility) • Does the partition avoid redundant computation and storage requirements? (Scalability) • Are tasks of comparable size? (Load Balancing) • Does the number of tasks scale with problem size? Increase in problem size should increase the number of tasks rather than the size of individual tasks. • Identify alternative partitions? You can maximize flexibility in subsequent design stages by considering alternatives now. • Investigate both domain and functional decompositions. Decomposition : What to look for
  • 24. • The order of statement execution affects the results of the program. • Multiple use of the same location(s) in storage by different tasks. Data Dependencies DO J = MYSTART,MYEND A(J) = A(J-1) * 2.0 END DO Task 1 Task2 ------ ------ X = 2 X = 4 . . . . ………… Y = X**2 Y = X**3 Loop carried dependence Loop independent data dependency
  • 25. DO J = MYSTART,MYEND A(J) = A(J-1) * 2.0 END DO Task 1 Task2 ------ ------ X = 2 X = 4 . . . . ………… Y = X**2 Y = X**3 If Task 2 has A(J) and task 1 has A(J-1) Distributed memory architecture - task 2 must obtain the value of A(J-1) from task 1 after task 1 finishes its computation Shared memory architecture - task 2 must read A(J-1) after task 1 updates it (Race Condition)The value ofY is dependent on: Distributed memory architecture - if or when the value of X is communicated between the tasks. Shared memory architecture - which task last stores the value of X. Data Dependencies
  • 26. • Distributed memory architectures - communicate required data at synchronization points. • Shared memory architectures -synchronize read/write operations between tasks. • Data Dependencies:- Mutual Exclusion Locks & Critical Sections • Task Dependencies:- Explicit or Implicit Synchronization points called Barriers Handling Dependencies
  • 27. GOAL : Assigning the tasks/ processes to Processors while Minimizing Parallel Processing Overheads • Maximize data locality • Minimize volume of data-exchange • Minimize frequency of interactions • Minimize contention and hot spots • Overlap computation with interactions • Selective data and computation replication Mapping
  • 29. HPC Solution Stack Pthreads Application Libraries (MPI/ OpenMP) C l o u d /I n h o u s e Cluster Management Job Scheduling Billing & Reporting Custom User Interface Compute Servers Storage Interconnect Management Servers Debuggers Dev and Perf Tools