SlideShare una empresa de Scribd logo
1 de 67
Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007
Single-core computer
Single-core CPU chip the single core
Multi-core architectures ,[object Object],Core 1 Core 2 Core 3 Core 4 Multi-core CPU chip
Multi-core CPU chip ,[object Object],[object Object],core 1 core 2 core 3 core 4
The cores run in parallel core 1 core 2 core 3 core 4 thread 1 thread 2 thread 3 thread 4
Within each core, threads are time-sliced (just like on a uniprocessor) core 1 core 2 core 3 core 4 several  threads several  threads several  threads several  threads
Interaction with the Operating System ,[object Object],[object Object],[object Object]
Why multi-core ? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Instruction-level parallelism ,[object Object],[object Object],[object Object]
Thread-level parallelism (TLP) ,[object Object],[object Object],[object Object],[object Object],[object Object]
General context: Multiprocessors ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Lemieux cluster, Pittsburgh  supercomputing  center
Multiprocessor memory types ,[object Object],[object Object]
Multi-core processor is a special kind of a multiprocessor: All processors are on the same chip ,[object Object],[object Object]
What applications benefit  from multi-core? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Each can run on its own core
More examples ,[object Object],[object Object],[object Object],[object Object]
A technique complementary to multi-core: Simultaneous multithreading   ,[object Object],[object Object],[object Object],[object Object],Source: Intel BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus
Simultaneous multithreading (SMT) ,[object Object],[object Object],[object Object]
Without SMT, only a single thread can run at any given time BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1: floating point
Without SMT, only a single thread can run at any given time BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 2: integer operation
SMT processor: both threads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1: floating point Thread 2: integer operation
But: Can’t simultaneously use  the same functional unit BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 2 This scenario is impossible with SMT on a single core (assuming a single integer unit) IMPOSSIBLE
SMT not a “true” parallel processor ,[object Object],[object Object],[object Object],[object Object]
Multi-core:  threads can run on separate cores BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus Thread 1 Thread 2
Multi-core:  threads can run on separate cores BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus Thread 3 Thread 4
Combining Multi-core and SMT ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SMT Dual-core: all four threads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus Thread 1 Thread 3 Thread 2 Thread 4
Comparison: multi-core vs SMT ,[object Object]
Comparison: multi-core vs SMT ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The memory hierarchy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
“Fish” machines ,[object Object],[object Object],[object Object],[object Object],memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 hyper-threads
Designs with private L2 caches memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 L2 cache memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 L2 cache Both L1 and L2 are private Examples: AMD Opteron,  AMD Athlon, Intel Pentium D L3 cache L3 cache A design with L3 caches Example: Intel Itanium 2
Private vs shared caches? ,[object Object]
Private vs shared caches ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The cache coherence problem ,[object Object],[object Object]
The cache coherence problem Suppose variable x initially contains 15213 One or more  levels of  cache One or more  levels of  cache One or more  levels of  cache One or more  levels of  cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
The cache coherence problem Core 1 reads x One or more  levels of  cache x=15213 One or more  levels of  cache One or more  levels of  cache One or more  levels of  cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
The cache coherence problem Core 2 reads x One or more  levels of  cache x=15213 One or more  levels of  cache x=15213 One or more  levels of  cache One or more  levels of  cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
The cache coherence problem Core 1 writes to x, setting it to 21660 One or more  levels of  cache x=21660 One or more  levels of  cache x=15213 One or more  levels of  cache One or more  levels of  cache Main memory x=21660 multi-core chip assuming  write-through  caches Core 1 Core 2 Core 3 Core 4
The cache coherence problem Core 2 attempts to read x… gets a stale copy One or more  levels of  cache x=21660 One or more  levels of  cache x=15213 One or more  levels of  cache One or more  levels of  cache Main memory x=21660 multi-core chip Core 1 Core 2 Core 3 Core 4
Solutions for cache coherence ,[object Object],[object Object],[object Object]
Inter-core bus One or more  levels of  cache One or more  levels of  cache One or more  levels of  cache One or more  levels of  cache Main memory multi-core chip inter-core bus Core 1 Core 2 Core 3 Core 4
Invalidation protocol with snooping ,[object Object],[object Object]
The cache coherence problem Revisited: Cores 1 and 2 have both read x One or more  levels of  cache x=15213 One or more  levels of  cache x=15213 One or more  levels of  cache One or more  levels of  cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
The cache coherence problem Core 1 writes to x, setting it to 21660 One or more  levels of  cache x=21660 One or more  levels of  cache x=15213 One or more  levels of  cache One or more  levels of  cache Main memory x=21660 multi-core chip assuming  write-through  caches INVALIDATED sends invalidation request inter-core bus Core 1 Core 2 Core 3 Core 4
The cache coherence problem After invalidation: One or more  levels of  cache x=21660 One or more  levels of  cache One or more  levels of  cache One or more  levels of  cache Main memory x=21660 multi-core chip Core 1 Core 2 Core 3 Core 4
The cache coherence problem Core 2 reads x. Cache misses,   and loads the new copy. One or more  levels of  cache x=21660 One or more  levels of  cache x=21660 One or more  levels of  cache One or more  levels of  cache Main memory x=21660 multi-core chip Core 1 Core 2 Core 3 Core 4
Alternative to invalidate protocol: update protocol Core 1 writes x=21660: One or more  levels of  cache x=21660 One or more  levels of  cache x= 21660 One or more  levels of  cache One or more  levels of  cache Main memory x=21660 multi-core chip assuming  write-through  caches UPDATED broadcasts updated value inter-core bus Core 1 Core 2 Core 3 Core 4
Which do you think is better? Invalidation or update?
Invalidation vs update ,[object Object],[object Object],[object Object],[object Object]
Invalidation protocols ,[object Object],[object Object],[object Object]
Programming for multi-core ,[object Object],[object Object],[object Object],[object Object]
Thread safety very important ,[object Object],[object Object],[object Object]
However: Need to use synchronization even if only time-slicing on a uniprocessor ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Need to use synchronization even if only time-slicing on a uniprocessor ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],gives counter=2 gives counter=1
Assigning threads to the cores ,[object Object],[object Object],[object Object],[object Object]
Affinity masks are bit vectors ,[object Object],1 0 1 1 core 3 core 2 core 1 core 0 ,[object Object]
Affinity masks when multi-core and SMT combined ,[object Object],[object Object],1 core 3 core 2 core 1 core 0 1 0 0 1 0 1 1 thread 1 ,[object Object],[object Object],thread 0 thread 1 thread 0 thread 1 thread 0 thread 1 thread 0
Default Affinities ,[object Object],[object Object],[object Object]
Process migration is costly ,[object Object],[object Object],[object Object],[object Object]
Hard affinities ,[object Object],[object Object]
When to set your own affinities ,[object Object],[object Object],[object Object],Source: Sensable.com
Kernel scheduler API ,[object Object],[object Object],[object Object],[object Object]
Kernel scheduler API ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Windows Task Manager core 2 core 1
Legal licensing issues ,[object Object],[object Object]
Conclusion ,[object Object],[object Object],[object Object]

Más contenido relacionado

La actualidad más candente

Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
Uday Sharma
 
chap 18 multicore computers
chap 18 multicore computers chap 18 multicore computers
chap 18 multicore computers
Sher Shah Merkhel
 

La actualidad más candente (20)

Parallel processing
Parallel processingParallel processing
Parallel processing
 
Multi core processors
Multi core processorsMulti core processors
Multi core processors
 
Multicore Computers
Multicore ComputersMulticore Computers
Multicore Computers
 
1 Computer Architecture
1 Computer Architecture1 Computer Architecture
1 Computer Architecture
 
Introduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technologyIntroduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technology
 
Computer Organization and Design
Computer Organization and DesignComputer Organization and Design
Computer Organization and Design
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
Intel Core I5
Intel Core I5Intel Core I5
Intel Core I5
 
Cuda Architecture
Cuda ArchitectureCuda Architecture
Cuda Architecture
 
Computer architecture multi processor
Computer architecture multi processorComputer architecture multi processor
Computer architecture multi processor
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
Architecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPUArchitecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPU
 
chap 18 multicore computers
chap 18 multicore computers chap 18 multicore computers
chap 18 multicore computers
 
GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
 
Multicore computers
Multicore computersMulticore computers
Multicore computers
 
Shared-Memory Multiprocessors
Shared-Memory MultiprocessorsShared-Memory Multiprocessors
Shared-Memory Multiprocessors
 
Multi core processors
Multi core processorsMulti core processors
Multi core processors
 
Micro controller and dsp processor
Micro controller and dsp processorMicro controller and dsp processor
Micro controller and dsp processor
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
CUDA Architecture
CUDA ArchitectureCUDA Architecture
CUDA Architecture
 

Destacado

Intel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFIntel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOF
Ofer Rosenberg
 

Destacado (19)

IBM z/OS V2R2 Networking Technologies Update
IBM z/OS V2R2 Networking Technologies UpdateIBM z/OS V2R2 Networking Technologies Update
IBM z/OS V2R2 Networking Technologies Update
 
Intel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFIntel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOF
 
Ludden q3 2008_boston
Ludden q3 2008_bostonLudden q3 2008_boston
Ludden q3 2008_boston
 
Embedded Solutions 2010: Intel Multicore by Eastronics
Embedded Solutions 2010:  Intel Multicore by Eastronics Embedded Solutions 2010:  Intel Multicore by Eastronics
Embedded Solutions 2010: Intel Multicore by Eastronics
 
IBM z/OS V2R2 Performance and Availability Topics
IBM z/OS V2R2 Performance and Availability TopicsIBM z/OS V2R2 Performance and Availability Topics
IBM z/OS V2R2 Performance and Availability Topics
 
z/OS V2R2 Enhancements
z/OS V2R2 Enhancementsz/OS V2R2 Enhancements
z/OS V2R2 Enhancements
 
Cache & CPU performance
Cache & CPU performanceCache & CPU performance
Cache & CPU performance
 
可靠分布式系统基础 Paxos的直观解释
可靠分布式系统基础 Paxos的直观解释可靠分布式系统基础 Paxos的直观解释
可靠分布式系统基础 Paxos的直观解释
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
SMP/Multithread
SMP/MultithreadSMP/Multithread
SMP/Multithread
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Computex 2014 AMD Press Conference
Computex 2014 AMD Press ConferenceComputex 2014 AMD Press Conference
Computex 2014 AMD Press Conference
 
AMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores ArchitectureAMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores Architecture
 
Linux Performance Analysis and Tools
Linux Performance Analysis and ToolsLinux Performance Analysis and Tools
Linux Performance Analysis and Tools
 
KVM and docker LXC Benchmarking with OpenStack
KVM and docker LXC Benchmarking with OpenStackKVM and docker LXC Benchmarking with OpenStack
KVM and docker LXC Benchmarking with OpenStack
 

Similar a Multi-core architectures

5.6 Basic computer structure microprocessors
5.6 Basic computer structure   microprocessors5.6 Basic computer structure   microprocessors
5.6 Basic computer structure microprocessors
lpapadop
 
Final draft intel core i5 processors architecture
Final draft intel core i5 processors architectureFinal draft intel core i5 processors architecture
Final draft intel core i5 processors architecture
Jawid Ahmad Baktash
 
fundamentals of digital communication Unit 5_microprocessor.pdf
fundamentals of digital communication Unit 5_microprocessor.pdffundamentals of digital communication Unit 5_microprocessor.pdf
fundamentals of digital communication Unit 5_microprocessor.pdf
shubhangisonawane6
 
Paralle programming 2
Paralle programming 2Paralle programming 2
Paralle programming 2
Anshul Sharma
 

Similar a Multi-core architectures (20)

27 multicore
27 multicore27 multicore
27 multicore
 
27 multicore
27 multicore27 multicore
27 multicore
 
multi-core Processor.ppt for IGCSE ICT and Computer Science Students
multi-core Processor.ppt for IGCSE ICT and Computer Science Studentsmulti-core Processor.ppt for IGCSE ICT and Computer Science Students
multi-core Processor.ppt for IGCSE ICT and Computer Science Students
 
Osa-multi-core.ppt
Osa-multi-core.pptOsa-multi-core.ppt
Osa-multi-core.ppt
 
Trip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningTrip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine Learning
 
Processors and its Types
Processors and its TypesProcessors and its Types
Processors and its Types
 
Multi-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architectureMulti-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architecture
 
5.6 Basic computer structure microprocessors
5.6 Basic computer structure   microprocessors5.6 Basic computer structure   microprocessors
5.6 Basic computer structure microprocessors
 
Multi-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IKMulti-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IK
 
Memory Mapping Cache
Memory Mapping CacheMemory Mapping Cache
Memory Mapping Cache
 
Lec04 gpu architecture
Lec04 gpu architectureLec04 gpu architecture
Lec04 gpu architecture
 
Lecture 4.pptx
Lecture 4.pptxLecture 4.pptx
Lecture 4.pptx
 
Intro To .Net Threads
Intro To .Net ThreadsIntro To .Net Threads
Intro To .Net Threads
 
Final draft intel core i5 processors architecture
Final draft intel core i5 processors architectureFinal draft intel core i5 processors architecture
Final draft intel core i5 processors architecture
 
fundamentals of digital communication Unit 5_microprocessor.pdf
fundamentals of digital communication Unit 5_microprocessor.pdffundamentals of digital communication Unit 5_microprocessor.pdf
fundamentals of digital communication Unit 5_microprocessor.pdf
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.ppt
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
Paralle programming 2
Paralle programming 2Paralle programming 2
Paralle programming 2
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
L05 parallel
L05 parallelL05 parallel
L05 parallel
 

Más de nextlib

Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Arch
nextlib
 
D Rb Silicon Valley Ruby Conference
D Rb   Silicon Valley Ruby ConferenceD Rb   Silicon Valley Ruby Conference
D Rb Silicon Valley Ruby Conference
nextlib
 
Aldous Huxley Brave New World
Aldous Huxley Brave New WorldAldous Huxley Brave New World
Aldous Huxley Brave New World
nextlib
 
Social Graph
Social GraphSocial Graph
Social Graph
nextlib
 
Ajax Prediction
Ajax PredictionAjax Prediction
Ajax Prediction
nextlib
 
SVD review
SVD reviewSVD review
SVD review
nextlib
 
Mongrel Handlers
Mongrel HandlersMongrel Handlers
Mongrel Handlers
nextlib
 
Blue Ocean Strategy
Blue Ocean StrategyBlue Ocean Strategy
Blue Ocean Strategy
nextlib
 
日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學
nextlib
 
Comparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering SystemsComparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering Systems
nextlib
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
nextlib
 
Agile Adoption2007
Agile Adoption2007Agile Adoption2007
Agile Adoption2007
nextlib
 
Modern Compiler Design
Modern Compiler DesignModern Compiler Design
Modern Compiler Design
nextlib
 
透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲
nextlib
 
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
nextlib
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
Bigtable
BigtableBigtable
Bigtable
nextlib
 

Más de nextlib (20)

Nio
NioNio
Nio
 
Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Arch
 
D Rb Silicon Valley Ruby Conference
D Rb   Silicon Valley Ruby ConferenceD Rb   Silicon Valley Ruby Conference
D Rb Silicon Valley Ruby Conference
 
Aldous Huxley Brave New World
Aldous Huxley Brave New WorldAldous Huxley Brave New World
Aldous Huxley Brave New World
 
Social Graph
Social GraphSocial Graph
Social Graph
 
Ajax Prediction
Ajax PredictionAjax Prediction
Ajax Prediction
 
Closures for Java
Closures for JavaClosures for Java
Closures for Java
 
A Content-Driven Reputation System for the Wikipedia
A Content-Driven Reputation System for the WikipediaA Content-Driven Reputation System for the Wikipedia
A Content-Driven Reputation System for the Wikipedia
 
SVD review
SVD reviewSVD review
SVD review
 
Mongrel Handlers
Mongrel HandlersMongrel Handlers
Mongrel Handlers
 
Blue Ocean Strategy
Blue Ocean StrategyBlue Ocean Strategy
Blue Ocean Strategy
 
日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學
 
Comparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering SystemsComparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering Systems
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
Agile Adoption2007
Agile Adoption2007Agile Adoption2007
Agile Adoption2007
 
Modern Compiler Design
Modern Compiler DesignModern Compiler Design
Modern Compiler Design
 
透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲
 
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Bigtable
BigtableBigtable
Bigtable
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Multi-core architectures

  • 1. Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007
  • 3. Single-core CPU chip the single core
  • 4.
  • 5.
  • 6. The cores run in parallel core 1 core 2 core 3 core 4 thread 1 thread 2 thread 3 thread 4
  • 7. Within each core, threads are time-sliced (just like on a uniprocessor) core 1 core 2 core 3 core 4 several threads several threads several threads several threads
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. Without SMT, only a single thread can run at any given time BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1: floating point
  • 20. Without SMT, only a single thread can run at any given time BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 2: integer operation
  • 21. SMT processor: both threads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1: floating point Thread 2: integer operation
  • 22. But: Can’t simultaneously use the same functional unit BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 2 This scenario is impossible with SMT on a single core (assuming a single integer unit) IMPOSSIBLE
  • 23.
  • 24. Multi-core: threads can run on separate cores BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 2
  • 25. Multi-core: threads can run on separate cores BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 3 Thread 4
  • 26.
  • 27. SMT Dual-core: all four threads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 3 Thread 2 Thread 4
  • 28.
  • 29.
  • 30.
  • 31.
  • 32. Designs with private L2 caches memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 L2 cache memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 L2 cache Both L1 and L2 are private Examples: AMD Opteron, AMD Athlon, Intel Pentium D L3 cache L3 cache A design with L3 caches Example: Intel Itanium 2
  • 33.
  • 34.
  • 35.
  • 36. The cache coherence problem Suppose variable x initially contains 15213 One or more levels of cache One or more levels of cache One or more levels of cache One or more levels of cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 37. The cache coherence problem Core 1 reads x One or more levels of cache x=15213 One or more levels of cache One or more levels of cache One or more levels of cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 38. The cache coherence problem Core 2 reads x One or more levels of cache x=15213 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 39. The cache coherence problem Core 1 writes to x, setting it to 21660 One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=21660 multi-core chip assuming write-through caches Core 1 Core 2 Core 3 Core 4
  • 40. The cache coherence problem Core 2 attempts to read x… gets a stale copy One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=21660 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 41.
  • 42. Inter-core bus One or more levels of cache One or more levels of cache One or more levels of cache One or more levels of cache Main memory multi-core chip inter-core bus Core 1 Core 2 Core 3 Core 4
  • 43.
  • 44. The cache coherence problem Revisited: Cores 1 and 2 have both read x One or more levels of cache x=15213 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 45. The cache coherence problem Core 1 writes to x, setting it to 21660 One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=21660 multi-core chip assuming write-through caches INVALIDATED sends invalidation request inter-core bus Core 1 Core 2 Core 3 Core 4
  • 46. The cache coherence problem After invalidation: One or more levels of cache x=21660 One or more levels of cache One or more levels of cache One or more levels of cache Main memory x=21660 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 47. The cache coherence problem Core 2 reads x. Cache misses, and loads the new copy. One or more levels of cache x=21660 One or more levels of cache x=21660 One or more levels of cache One or more levels of cache Main memory x=21660 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 48. Alternative to invalidate protocol: update protocol Core 1 writes x=21660: One or more levels of cache x=21660 One or more levels of cache x= 21660 One or more levels of cache One or more levels of cache Main memory x=21660 multi-core chip assuming write-through caches UPDATED broadcasts updated value inter-core bus Core 1 Core 2 Core 3 Core 4
  • 49. Which do you think is better? Invalidation or update?
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65. Windows Task Manager core 2 core 1
  • 66.
  • 67.