SlideShare una empresa de Scribd logo
1 de 127
Parallel Computing Platforms Ananth Grama, Anshul Gupta,  George Karypis, and Vipin Kumar   To accompany the text ``Introduction to Parallel Computing'',  Addison Wesley, 2003.
Topic Overview  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Scope of Parallelism  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Implicit Parallelism: Trends in Microprocessor Architectures  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Pipelining and Superscalar Execution  ,[object Object],[object Object],[object Object]
Pipelining and Superscalar Execution  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Pipelining and Superscalar Execution  ,[object Object],[object Object]
Superscalar Execution: An Example   Example of a two-way superscalar execution of instructions.
Superscalar Execution: An Example ,[object Object],[object Object]
Superscalar Execution  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Superscalar Execution:  Issue Mechanisms  ,[object Object],[object Object],[object Object]
Superscalar Execution:  Efficiency Considerations  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Very Long Instruction Word (VLIW) Processors  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Very Long Instruction Word (VLIW) Processors: Considerations  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Limitations of  Memory System Performance  ,[object Object],[object Object],[object Object],[object Object]
Memory System Performance: Bandwidth and Latency  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Memory Latency: An Example  ,[object Object],[object Object],[object Object]
Memory Latency: An Example  ,[object Object],[object Object],[object Object]
Improving Effective Memory  Latency Using Caches  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Impact of Caches: Example  ,[object Object]
Impact of Caches: Example (continued) ,[object Object],[object Object],[object Object],[object Object],[object Object]
Impact of Caches ,[object Object],[object Object],[object Object]
Impact of Memory Bandwidth ,[object Object],[object Object],[object Object]
Impact of Memory Bandwidth: Example ,[object Object],[object Object],[object Object],[object Object]
Impact of Memory Bandwidth ,[object Object],[object Object],[object Object],[object Object]
Impact of Memory Bandwidth  ,[object Object],[object Object],[object Object]
Impact of Memory Bandwidth: Example  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Impact of Memory Bandwidth: Example  ,[object Object],[object Object],[object Object],Multiplying a matrix with a vector: (a) multiplying column-by-column, keeping a running sum; (b) computing each element of the result as a dot product of a row of the matrix with the vector.
Impact of Memory Bandwidth: Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Memory System Performance: Summary  ,[object Object],[object Object],[object Object],[object Object]
Alternate Approaches for  Hiding Memory Latency  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Multithreading for Latency Hiding  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Multithreading for Latency Hiding: Example ,[object Object],[object Object],[object Object],[object Object]
Multithreading for Latency Hiding  ,[object Object],[object Object],[object Object]
Prefetching for Latency Hiding  ,[object Object],[object Object],[object Object],[object Object]
Tradeoffs of Multithreading and Prefetching  ,[object Object],[object Object],[object Object]
Tradeoffs of Multithreading and Prefetching  ,[object Object],[object Object],[object Object],[object Object]
Explicitly Parallel Platforms
Dichotomy of Parallel Computing Platforms  ,[object Object],[object Object]
Control Structure of Parallel Programs  ,[object Object],[object Object]
Control Structure of Parallel Programs  ,[object Object],[object Object],[object Object]
SIMD and MIMD Processors A typical SIMD architecture (a) and a typical MIMD architecture (b).
SIMD Processors  ,[object Object],[object Object],[object Object],[object Object]
Conditional Execution in SIMD Processors  Executing a conditional statement on an SIMD computer with four processors: (a) the conditional statement; (b) the execution of the statement in two steps.
MIMD Processors ,[object Object],[object Object],[object Object],[object Object]
SIMD-MIMD Comparison  ,[object Object],[object Object],[object Object],[object Object]
Communication Model  of Parallel Platforms  ,[object Object],[object Object],[object Object]
Shared-Address-Space Platforms  ,[object Object],[object Object],[object Object]
NUMA and UMA Shared-Address-Space Platforms  Typical shared-address-space architectures: (a) Uniform-memory access shared-address-space computer; (b) Uniform-memory-access shared-address-space computer with caches and memories; (c) Non-uniform-memory-access shared-address-space computer with local memory only.
NUMA and UMA  Shared-Address-Space Platforms  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Shared-Address-Space  vs.  Shared Memory Machines   ,[object Object],[object Object],[object Object]
Message-Passing Platforms  ,[object Object],[object Object],[object Object],[object Object]
Message Passing  vs.  Shared Address Space Platforms   ,[object Object],[object Object]
Physical Organization  of Parallel Platforms  ,[object Object]
Architecture of an  Ideal Parallel Computer  ,[object Object],[object Object],[object Object]
Architecture of an  Ideal Parallel Computer ,[object Object],[object Object],[object Object],[object Object],[object Object]
Architecture of an  Ideal Parallel Computer  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Physical Complexity of an  Ideal Parallel Computer  ,[object Object],[object Object],[object Object]
Interconnection Networks  for Parallel Computers  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Static and Dynamic Interconnection Networks  Classification of interconnection networks: (a) a static network; and (b) a dynamic network.
Interconnection Networks  ,[object Object],[object Object],[object Object]
Interconnection Networks:  Network Interfaces  ,[object Object],[object Object],[object Object],[object Object]
Network Topologies  ,[object Object],[object Object],[object Object]
Network Topologies: Buses  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Network Topologies: Buses  Bus-based interconnects (a) with no local caches; (b) with local memory/caches. Since much of the data accessed by processors is local to the processor, a local memory can improve the performance of bus-based machines.
Network Topologies: Crossbars A completely non-blocking crossbar network connecting  p  processors to b memory banks. A crossbar network uses an  p×m  grid of switches to connect  p  inputs to m outputs in a non-blocking manner.
Network Topologies: Crossbars ,[object Object],[object Object],[object Object]
Network Topologies:  Multistage Networks  ,[object Object],[object Object],[object Object]
Network Topologies:  Multistage Networks The schematic of a typical multistage interconnection network.
Network Topologies: Multistage Omega Network ,[object Object],[object Object],[object Object]
Network Topologies:  Multistage Omega Network Each stage of the Omega network implements a perfect shuffle as follows: A perfect shuffle interconnection for eight inputs and outputs.
Network Topologies:  Multistage Omega Network ,[object Object],[object Object],Two switching configurations of the 2 × 2 switch:  (a) Pass-through; (b) Cross-over.
Network Topologies:  Multistage Omega Network A complete omega network connecting eight inputs and eight outputs. An omega network has  p/2 × log p  switching nodes, and the cost of such a network grows as  (p log p). A complete Omega network with the perfect shuffle interconnects and switches can now be illustrated :
Network Topologies:  Multistage Omega Network – Routing ,[object Object],[object Object],[object Object],[object Object]
Network Topologies:  Multistage Omega Network – Routing An example of blocking in omega network: one of the messages  (010 to 111 or 110 to 100) is blocked at link AB.
Network Topologies:  Completely Connected Network ,[object Object],[object Object],[object Object],[object Object]
Network Topologies: Completely Connected and Star Connected Networks Example of an 8-node completely connected network. (a) A completely-connected network of eight nodes;  (b) a star connected network of nine nodes.
Network Topologies:  Star Connected Network ,[object Object],[object Object],[object Object]
Network Topologies:  Linear Arrays, Meshes, and  k-d  Meshes ,[object Object],[object Object],[object Object],[object Object]
Network Topologies: Linear Arrays Linear arrays: (a) with no wraparound links; (b) with wraparound link.
Network Topologies:  Two- and Three Dimensional Meshes Two and three dimensional meshes: (a) 2-D mesh with no wraparound; (b) 2-D mesh with wraparound link (2-D torus); and (c) a 3-D mesh with no wraparound.
Network Topologies:  Hypercubes and their Construction Construction of hypercubes from hypercubes of lower dimension.
Network Topologies:  Properties of Hypercubes ,[object Object],[object Object],[object Object]
Network Topologies: Tree-Based Networks Complete binary tree networks: (a) a static tree network; and (b) a dynamic tree network.
Network Topologies: Tree Properties  ,[object Object],[object Object],[object Object],[object Object]
Network Topologies: Fat Trees A fat tree network of 16 processing nodes.
Evaluating  Static Interconnection Networks ,[object Object],[object Object],[object Object]
Evaluating  Static Interconnection Networks Wraparound  k -ary  d -cube  Hypercube  2-D wraparound mesh  2-D mesh, no wraparound  Linear array  Complete binary tree  Star  Completely-connected  Cost  (No. of links)  Arc Connectivity  BisectionWidth  Diameter  Network
Evaluating Dynamic Interconnection Networks Dynamic Tree  Omega Network  Crossbar  Cost  (No. of links)  Arc Connectivity  Bisection Width  Diameter  Network
Cache Coherence  in Multiprocessor Systems  ,[object Object],[object Object],[object Object],[object Object]
Cache Coherence  in Multiprocessor Systems  Cache coherence in multiprocessor systems: (a) Invalidate protocol; (b) Update protocol for shared variables. When the value of a variable is changes, all its copies must either be invalidated or updated.
Cache Coherence:  Update and Invalidate Protocols  ,[object Object],[object Object],[object Object],[object Object]
Maintaining Coherence  Using Invalidate Protocols  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Maintaining Coherence  Using Invalidate Protocols State diagram of a simple three-state coherence protocol.
Maintaining Coherence  Using Invalidate Protocols  Example of parallel program execution with the simple three-state coherence protocol.
Snoopy Cache Systems How are invalidates sent to the right processors? In snoopy caches, there is a broadcast media that listens to all invalidates and read requests and performs appropriate coherence operations locally. A simple snoopy bus based cache coherence system.
Performance of Snoopy Caches  ,[object Object],[object Object],[object Object]
Directory Based Systems  ,[object Object],[object Object],[object Object]
Directory Based Systems Architecture of typical directory based systems: (a) a centralized directory; and (b) a distributed directory.
Performance of  Directory Based Schemes  ,[object Object],[object Object],[object Object],[object Object]
Communication Costs  in Parallel Machines  ,[object Object],[object Object]
Message Passing Costs in  Parallel Computers ,[object Object],[object Object],[object Object],[object Object]
Store-and-Forward Routing  ,[object Object],[object Object],[object Object]
Routing Techniques Passing a message from node  P 0  to  P 3  (a) through a store-and-forward communication network; (b) and (c) extending the concept to cut-through routing. The shaded regions represent the time that the message is in transit. The startup time associated with this message transfer is assumed to be zero.
Packet Routing ,[object Object],[object Object],[object Object],[object Object],[object Object]
Cut-Through Routing  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Cut-Through Routing  ,[object Object],[object Object]
Simplified Cost Model for Communicating Messages ,[object Object],[object Object],[object Object],[object Object]
Simplified Cost Model for Communicating Messages ,[object Object],[object Object],[object Object],[object Object]
Cost Models for  Shared Address Space Machines  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Routing Mechanisms  for Interconnection Networks  ,[object Object],[object Object],[object Object]
Routing Mechanisms  for Interconnection Networks Routing a message from node  P s  (010) to node  P d  (111) in a three-dimensional hypercube using E-cube routing.
Mapping Techniques for Graphs  ,[object Object],[object Object],[object Object]
Mapping Techniques for Graphs: Metrics  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Embedding a Linear Array  into a Hypercube  ,[object Object],[object Object],0
Embedding a Linear Array  into a Hypercube ,[object Object],[object Object]
Embedding a Linear Array  into a Hypercube: Example ( a) A three-bit reflected Gray code ring; and (b) its embedding into a three-dimensional hypercube.
Embedding a Mesh  into a Hypercube ,[object Object]
Embedding a Mesh into a Hypercube (a) A 4 × 4 mesh illustrating the mapping of mesh nodes to the nodes in a four-dimensional hypercube; and (b) a 2 × 4 mesh embedded into a three-dimensional hypercube. Once again, the congestion, dilation, and expansion of the mapping is 1.
Embedding a Mesh into a Linear Array  ,[object Object],[object Object],[object Object]
Embedding a Mesh into a Linear Array: Example (a) Embedding a 16 node linear array into a 2-D mesh; and (b) the inverse of the mapping. Solid lines correspond to links in the linear array and normal lines to links in the mesh.
Embedding a Hypercube into a 2-D Mesh ,[object Object],[object Object],[object Object]
Embedding a Hypercube into a 2-D Mesh: Example  Embedding a hypercube into a 2-D mesh.
Case Studies:  The IBM Blue-Gene Architecture  The hierarchical architecture of Blue Gene.
Case Studies:  The Cray T3E Architecture Interconnection network of the Cray T3E:  (a) node architecture; (b) network topology.
Case Studies:  The SGI Origin 3000 Architecture Architecture of the SGI Origin 3000 family of servers.
Case Studies:  The Sun HPC Server Architecture Architecture of the Sun Enterprise family of servers.

Más contenido relacionado

La actualidad más candente

Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukAndrii Vozniuk
 
Limitations of memory system performance
Limitations of memory system performanceLimitations of memory system performance
Limitations of memory system performanceSyed Zaid Irshad
 
Tdt4260 miniproject report_group_3
Tdt4260 miniproject report_group_3Tdt4260 miniproject report_group_3
Tdt4260 miniproject report_group_3Yulong Bai
 
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORSSTUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORSijdpsjournal
 
Dynamic load balancing in distributed systems in the presence of delays a re...
Dynamic load balancing in distributed systems in the presence of delays  a re...Dynamic load balancing in distributed systems in the presence of delays  a re...
Dynamic load balancing in distributed systems in the presence of delays a re...Mumbai Academisc
 
Performance of a speculative transmission scheme for scheduling latency reduc...
Performance of a speculative transmission scheme for scheduling latency reduc...Performance of a speculative transmission scheme for scheduling latency reduc...
Performance of a speculative transmission scheme for scheduling latency reduc...Mumbai Academisc
 
Analysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific ApplicationsAnalysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific ApplicationsJames McGalliard
 
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...IJCSEA Journal
 
Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Ismail Mukiibi
 
Fpga implementation of scalable queue manager
Fpga implementation of scalable queue managerFpga implementation of scalable queue manager
Fpga implementation of scalable queue manageriaemedu
 
Survey paper _ lakshmi yasaswi kamireddy(651771619)
Survey paper _ lakshmi yasaswi kamireddy(651771619)Survey paper _ lakshmi yasaswi kamireddy(651771619)
Survey paper _ lakshmi yasaswi kamireddy(651771619)Lakshmi Yasaswi Kamireddy
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUBNusrat Mary
 
Load Balancing In Distributed Computing
Load Balancing In Distributed ComputingLoad Balancing In Distributed Computing
Load Balancing In Distributed ComputingRicha Singh
 
process management
 process management process management
process managementAshish Kumar
 

La actualidad más candente (19)

Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii Vozniuk
 
Limitations of memory system performance
Limitations of memory system performanceLimitations of memory system performance
Limitations of memory system performance
 
Tdt4260 miniproject report_group_3
Tdt4260 miniproject report_group_3Tdt4260 miniproject report_group_3
Tdt4260 miniproject report_group_3
 
Cache memory
Cache memoryCache memory
Cache memory
 
SPROJReport (1)
SPROJReport (1)SPROJReport (1)
SPROJReport (1)
 
Oversimplified CA
Oversimplified CAOversimplified CA
Oversimplified CA
 
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORSSTUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS
 
Computer architecture
Computer architectureComputer architecture
Computer architecture
 
Dynamic load balancing in distributed systems in the presence of delays a re...
Dynamic load balancing in distributed systems in the presence of delays  a re...Dynamic load balancing in distributed systems in the presence of delays  a re...
Dynamic load balancing in distributed systems in the presence of delays a re...
 
Performance of a speculative transmission scheme for scheduling latency reduc...
Performance of a speculative transmission scheme for scheduling latency reduc...Performance of a speculative transmission scheme for scheduling latency reduc...
Performance of a speculative transmission scheme for scheduling latency reduc...
 
Analysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific ApplicationsAnalysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific Applications
 
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...
 
Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6
 
Fpga implementation of scalable queue manager
Fpga implementation of scalable queue managerFpga implementation of scalable queue manager
Fpga implementation of scalable queue manager
 
CS6601 DISTRIBUTED SYSTEMS
CS6601 DISTRIBUTED SYSTEMSCS6601 DISTRIBUTED SYSTEMS
CS6601 DISTRIBUTED SYSTEMS
 
Survey paper _ lakshmi yasaswi kamireddy(651771619)
Survey paper _ lakshmi yasaswi kamireddy(651771619)Survey paper _ lakshmi yasaswi kamireddy(651771619)
Survey paper _ lakshmi yasaswi kamireddy(651771619)
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUB
 
Load Balancing In Distributed Computing
Load Balancing In Distributed ComputingLoad Balancing In Distributed Computing
Load Balancing In Distributed Computing
 
process management
 process management process management
process management
 

Similar a Chap2 slides

week_2Lec02_CS422.pptx
week_2Lec02_CS422.pptxweek_2Lec02_CS422.pptx
week_2Lec02_CS422.pptxmivomi1
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxfaithxdunce63732
 
Pipeline Mechanism
Pipeline MechanismPipeline Mechanism
Pipeline MechanismAshik Iqbal
 
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip ParallelismSummary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip ParallelismFarwa Ansari
 
Memory consistency models
Memory consistency modelsMemory consistency models
Memory consistency modelspalani kumar
 
Hardback solution to accelerate multimedia computation through mgp in cmp
Hardback solution to accelerate multimedia computation through mgp in cmpHardback solution to accelerate multimedia computation through mgp in cmp
Hardback solution to accelerate multimedia computation through mgp in cmpeSAT Publishing House
 
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...IRJET Journal
 
Multicore Computers
Multicore ComputersMulticore Computers
Multicore ComputersA B Shinde
 
Study of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processorsStudy of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processorsateeq ateeq
 
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURESREDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURESijcsit
 
Reducing Competitive Cache Misses in Modern Processor Architectures
Reducing Competitive Cache Misses in Modern Processor ArchitecturesReducing Competitive Cache Misses in Modern Processor Architectures
Reducing Competitive Cache Misses in Modern Processor ArchitecturesAIRCC Publishing Corporation
 
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURESREDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURESijcsit
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesMurtadha Alsabbagh
 

Similar a Chap2 slides (20)

Chap2 slides
Chap2 slidesChap2 slides
Chap2 slides
 
week_2Lec02_CS422.pptx
week_2Lec02_CS422.pptxweek_2Lec02_CS422.pptx
week_2Lec02_CS422.pptx
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
 
Pipeline Mechanism
Pipeline MechanismPipeline Mechanism
Pipeline Mechanism
 
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip ParallelismSummary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
 
Memory consistency models
Memory consistency modelsMemory consistency models
Memory consistency models
 
Hardback solution to accelerate multimedia computation through mgp in cmp
Hardback solution to accelerate multimedia computation through mgp in cmpHardback solution to accelerate multimedia computation through mgp in cmp
Hardback solution to accelerate multimedia computation through mgp in cmp
 
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
 
Chapter 2 pc
Chapter 2 pcChapter 2 pc
Chapter 2 pc
 
Multicore Computers
Multicore ComputersMulticore Computers
Multicore Computers
 
shashank_spdp1993_00395543
shashank_spdp1993_00395543shashank_spdp1993_00395543
shashank_spdp1993_00395543
 
Study of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processorsStudy of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processors
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURESREDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
 
Reducing Competitive Cache Misses in Modern Processor Architectures
Reducing Competitive Cache Misses in Modern Processor ArchitecturesReducing Competitive Cache Misses in Modern Processor Architectures
Reducing Competitive Cache Misses in Modern Processor Architectures
 
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURESREDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
 
FrackingPaper
FrackingPaperFrackingPaper
FrackingPaper
 
Compiler design
Compiler designCompiler design
Compiler design
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 

Último

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Chap2 slides

  • 1. Parallel Computing Platforms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8. Superscalar Execution: An Example Example of a two-way superscalar execution of instructions.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 39.
  • 40.
  • 41.
  • 42. SIMD and MIMD Processors A typical SIMD architecture (a) and a typical MIMD architecture (b).
  • 43.
  • 44. Conditional Execution in SIMD Processors Executing a conditional statement on an SIMD computer with four processors: (a) the conditional statement; (b) the execution of the statement in two steps.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49. NUMA and UMA Shared-Address-Space Platforms Typical shared-address-space architectures: (a) Uniform-memory access shared-address-space computer; (b) Uniform-memory-access shared-address-space computer with caches and memories; (c) Non-uniform-memory-access shared-address-space computer with local memory only.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60. Static and Dynamic Interconnection Networks Classification of interconnection networks: (a) a static network; and (b) a dynamic network.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65. Network Topologies: Buses Bus-based interconnects (a) with no local caches; (b) with local memory/caches. Since much of the data accessed by processors is local to the processor, a local memory can improve the performance of bus-based machines.
  • 66. Network Topologies: Crossbars A completely non-blocking crossbar network connecting p processors to b memory banks. A crossbar network uses an p×m grid of switches to connect p inputs to m outputs in a non-blocking manner.
  • 67.
  • 68.
  • 69. Network Topologies: Multistage Networks The schematic of a typical multistage interconnection network.
  • 70.
  • 71. Network Topologies: Multistage Omega Network Each stage of the Omega network implements a perfect shuffle as follows: A perfect shuffle interconnection for eight inputs and outputs.
  • 72.
  • 73. Network Topologies: Multistage Omega Network A complete omega network connecting eight inputs and eight outputs. An omega network has p/2 × log p switching nodes, and the cost of such a network grows as (p log p). A complete Omega network with the perfect shuffle interconnects and switches can now be illustrated :
  • 74.
  • 75. Network Topologies: Multistage Omega Network – Routing An example of blocking in omega network: one of the messages (010 to 111 or 110 to 100) is blocked at link AB.
  • 76.
  • 77. Network Topologies: Completely Connected and Star Connected Networks Example of an 8-node completely connected network. (a) A completely-connected network of eight nodes; (b) a star connected network of nine nodes.
  • 78.
  • 79.
  • 80. Network Topologies: Linear Arrays Linear arrays: (a) with no wraparound links; (b) with wraparound link.
  • 81. Network Topologies: Two- and Three Dimensional Meshes Two and three dimensional meshes: (a) 2-D mesh with no wraparound; (b) 2-D mesh with wraparound link (2-D torus); and (c) a 3-D mesh with no wraparound.
  • 82. Network Topologies: Hypercubes and their Construction Construction of hypercubes from hypercubes of lower dimension.
  • 83.
  • 84. Network Topologies: Tree-Based Networks Complete binary tree networks: (a) a static tree network; and (b) a dynamic tree network.
  • 85.
  • 86. Network Topologies: Fat Trees A fat tree network of 16 processing nodes.
  • 87.
  • 88. Evaluating Static Interconnection Networks Wraparound k -ary d -cube Hypercube 2-D wraparound mesh 2-D mesh, no wraparound Linear array Complete binary tree Star Completely-connected Cost (No. of links) Arc Connectivity BisectionWidth Diameter Network
  • 89. Evaluating Dynamic Interconnection Networks Dynamic Tree Omega Network Crossbar Cost (No. of links) Arc Connectivity Bisection Width Diameter Network
  • 90.
  • 91. Cache Coherence in Multiprocessor Systems Cache coherence in multiprocessor systems: (a) Invalidate protocol; (b) Update protocol for shared variables. When the value of a variable is changes, all its copies must either be invalidated or updated.
  • 92.
  • 93.
  • 94. Maintaining Coherence Using Invalidate Protocols State diagram of a simple three-state coherence protocol.
  • 95. Maintaining Coherence Using Invalidate Protocols Example of parallel program execution with the simple three-state coherence protocol.
  • 96. Snoopy Cache Systems How are invalidates sent to the right processors? In snoopy caches, there is a broadcast media that listens to all invalidates and read requests and performs appropriate coherence operations locally. A simple snoopy bus based cache coherence system.
  • 97.
  • 98.
  • 99. Directory Based Systems Architecture of typical directory based systems: (a) a centralized directory; and (b) a distributed directory.
  • 100.
  • 101.
  • 102.
  • 103.
  • 104. Routing Techniques Passing a message from node P 0 to P 3 (a) through a store-and-forward communication network; (b) and (c) extending the concept to cut-through routing. The shaded regions represent the time that the message is in transit. The startup time associated with this message transfer is assumed to be zero.
  • 105.
  • 106.
  • 107.
  • 108.
  • 109.
  • 110.
  • 111.
  • 112. Routing Mechanisms for Interconnection Networks Routing a message from node P s (010) to node P d (111) in a three-dimensional hypercube using E-cube routing.
  • 113.
  • 114.
  • 115.
  • 116.
  • 117. Embedding a Linear Array into a Hypercube: Example ( a) A three-bit reflected Gray code ring; and (b) its embedding into a three-dimensional hypercube.
  • 118.
  • 119. Embedding a Mesh into a Hypercube (a) A 4 × 4 mesh illustrating the mapping of mesh nodes to the nodes in a four-dimensional hypercube; and (b) a 2 × 4 mesh embedded into a three-dimensional hypercube. Once again, the congestion, dilation, and expansion of the mapping is 1.
  • 120.
  • 121. Embedding a Mesh into a Linear Array: Example (a) Embedding a 16 node linear array into a 2-D mesh; and (b) the inverse of the mapping. Solid lines correspond to links in the linear array and normal lines to links in the mesh.
  • 122.
  • 123. Embedding a Hypercube into a 2-D Mesh: Example Embedding a hypercube into a 2-D mesh.
  • 124. Case Studies: The IBM Blue-Gene Architecture The hierarchical architecture of Blue Gene.
  • 125. Case Studies: The Cray T3E Architecture Interconnection network of the Cray T3E: (a) node architecture; (b) network topology.
  • 126. Case Studies: The SGI Origin 3000 Architecture Architecture of the SGI Origin 3000 family of servers.
  • 127. Case Studies: The Sun HPC Server Architecture Architecture of the Sun Enterprise family of servers.