SlideShare una empresa de Scribd logo
1 de 56
ECE 4100/6100
Advanced Computer Architecture
Lecture 13 Multithreading and Multicore Processors
Prof. Hsien-Hsin Sean Lee
School of Electrical and Computer Engineering
Georgia Institute of Technology
2
TLP
• ILP of a single program is hard
– Large ILP is Far-flung
– We are human after all, program w/ sequential mind
• Reality: running multiple threads or programs
• Thread Level Parallelism
– Time Multiplexing
– Throughput computing
– Multiple program workloads
– Multiple concurrent threads
– Helper threads to improve single program performance
3
Multi-Tasking Paradigm
• Virtual memory makes it easy
• Context switch could be
expensive or requires extra HW
– VIVT cache
– VIPT cache
– TLBs
Thread 1Thread 1
UnusedUnused
ExecutionTimeQuantumExecutionTimeQuantum
FU1FU1 FU2FU2 FU3FU3 FU4FU4
ConventionalConventional
SuperscalarSuperscalar
SingleSingle
ThreadedThreaded
Thread 2Thread 2
Thread 3Thread 3
Thread 4Thread 4
Thread 5Thread 5
4
Multi-threading Paradigm
Thread 1Thread 1
UnusedUnused
ExecutionTimeExecutionTime
FU1FU1 FU2FU2 FU3FU3 FU4FU4
ConventionalConventional
SuperscalarSuperscalar
SingleSingle
ThreadedThreaded
SimultaneousSimultaneous
MultithreadingMultithreading
(SMT)(SMT)
Fine-grainedFine-grained
MultithreadingMultithreading
(cycle-by-cycle(cycle-by-cycle
Interleaving)Interleaving)
Thread 2Thread 2
Thread 3Thread 3
Thread 4Thread 4
Thread 5Thread 5
Coarse-grainedCoarse-grained
MultithreadingMultithreading
(Block Interleaving)(Block Interleaving)
ChipChip
MultiprocessorMultiprocessor
(CMP or(CMP or
MultiCore)MultiCore)
5
Conventional Multithreading
• Zero-overhead context switch
• Duplicated contexts for threads
0:r0
0:r7
1:r0
1:r7
2:r0
2:r7
3:r0
3:r7
CtxtPtr
Memory (shared by threads)
Register file
6
Cycle Interleaving MT
• Per-cycle, Per-thread instruction fetching
• Examples: HEP, Horizon, Tera MTA, MIT M-
machine
• Interesting questions to consider
– Does it need a sophisticated branch predictor?
– Or does it need any speculative execution at all?
•Get rid of “branch predictionbranch prediction”?
•Get rid of “predicationpredication”?
– Does it need any out-of-order execution
capability?
7
Tera Multi-Threaded Architecture
• Cycle-by-cycle interleaving
• MTA can context-switch every cycle (3ns)
• As many as 128 distinct threads (hiding 384ns)
• 3-wide VLIW instruction format (M+ALU+ALU/Br)
• Each instruction has 3-bit for dependence lookahead
– Determine if there is dependency with subsequent instructions
– Execute up to 7 future VLIW instructions (before switch)
Loop:
nop r1=r2+r3 r5=r6+4 lookahead=1
nop r8=r9-r10 r11=r12-r13 lookahead=2
[r5]=r1 r4=r4-1 bnz Loop lookahead=0
8
Block Interleaving MT
• Context switch on a specific event (dynamic pipelining)
– Explicit switching: implementing a switchswitch instruction
– Implicit switching: trigger when a specific instruction class fetched
• Static switching (switch upon fetching)
– Switch-on-memory-instructions: Rhamma processor
– Switch-on-branch or switch-on-hard-to-predict-branch
– Trigger can be implicit or explicit instruction
• Dynamic switching
– Switch-on-cache-miss (switch in later pipeline stage): MIT Sparcle
(MIT Alewife’s node), Rhamma Processor
– Switch-on-use (lazy strategy of switch-on-cache-miss)
• Wait until last minute
• Valid bit needed for each register
– Clear when load issued, set when data returned
– Switch-on-signal (e.g. interrupt)
– Predicated switch instruction based on conditions
• No need to support a large number of threads
NVidia Fermi GPGPU Architecture
Nvidia’s Streaming Multiprocessor (SM)
• SIMD execution model
• Issue one instruction from each
warp to 16 CUDA cores
• One warp = 32 parallel threads
• Compute capability 2.0 allows
1536 resident threads (i.e., 48
warps) in one SM
11
RegisterRegister
RRenameenamerr
RegisterRegister
RRenameenamerr
RegisterRegister
RRenameenamerr
RegisterRegister
RRenameenamerr
RegisterRegister
RRenameenamerr
RegisterRegister
RRenameenamerr
RegisterRegister
RRenameenamerr
RegisterRegister
RRenameenamerr
RegisterRegister
RRenameenamerr
RegisterRegister
RRenameenamerr
Simultaneous Multithreading (SMT)• SMT name first used by UW; Earlier versions from UCSB [Nemirovsky, HICSS‘91] and [Hirata et al.,
ISCA-92]
• Intel’s HyperThreading (2-way SMT)
• IBM Power7 (4/6/8 cores, 4-way SMT); IBM Power5/6 (2 cores. Each 2-way SMT, 4 chips
per package) : Power5 has OoO cores, Power6 In-order cores;
• Basic ideas: Conventional MT + Simultaneous issue + Sharing common resources
RegReg
FileFile
FMultFMult
(4 cyc(4 cyclesles))
FAddFAdd
(2 cyc)(2 cyc)
ALU1ALU1ALU2ALU2
Load/StoreLoad/Store
(variable)(variable)
Fdiv, unpipeFdiv, unpipe
(16 cyc(16 cyclesles))
RS & ROBRS & ROB
plusplus
PhysicalPhysical
RegisterRegister
FileFile
RS & ROBRS & ROB
plusplus
PhysicalPhysical
RegisterRegister
FileFile
FetchFetch
UnitUnit
FetchFetch
UnitUnit
PCPCPCPCPCPCPCPCPCPCPCPCPCPCPCPC
I-CACHEI-CACHEI-CACHEI-CACHE
DecodeDecodeDecodeDecode
RegisterRegister
RRenameenamerr
RegisterRegister
RRenameenamerr
RegReg
FileFile
RegReg
FileFile
RegReg
FileFile
RegReg
FileFile
RegReg
FileFile
RegReg
FileFile
Reg
File
DD-CACHE-CACHEDD-CACHE-CACHE
RegisterRegister
RRenameenamerr
RegisterRegister
RRenameenamerr
RegisterRegister
RRenameenamerr
RegisterRegister
RRenameenamerr
12
Instruction Fetching Policy
• FIFO, Round Robin, simple but may be too naive
• Adaptive Fetching Policies
– BRCOUNT (reduce wrong path issuing)
• Count # of br inst in decode/rename/IQ stages
• Give top priority to thread with the least BRCOUNT
– MISSCOUT (reduce IQ clog)
• Count # of outstanding D-cache misses
• Give top priority to thread with the least MISSCOUNT
– ICOUNT (reduce IQ clog)
• Count # of inst in decode/rename/IQ stages
• Give top priority to thread with the least ICOUNT
– IQPOSN (reduce IQ clog)
• Give lowest priority to those threads with inst closest to the head of INT
or FP instruction queues
– Due to that threads with the oldest instructions will be most prone to IQ clog
• No Counter needed
13
Resource Sharing
• Could be tricky when threads compete for the resources
• Static
– Less complexity
– Could penalize threads (e.g. instruction window size)
– P4’s Hyperthreading
• Dynamic
– Complex
– What is fair? How to quantify fairness?
• A growing concern in Multi-core processors
– Shared L2, Bus bandwidth, etc.
– Issues
• Fairness
• Mutual thrashing
14
P4 HyperThreading Resource Partitioning
• TC (or UROM) is alternatively accessed per cycle for
each logical processor unless one is stalled due to
TC miss
∀ µop queue (into ½) after fetched from TC
• ROB (126/2)
• LB (48/2)
• SB (24/2) (32/2 for Prescott)
• General µop queue and memory µop queue (1/2)
• TLB (½?) as there is no PID
• Retirement: alternating between 2 logical
processors
15
Alpha 21464 (EV8) Processor
Technology
• Leading edge process technology – 1.2 ~ 2.0GHz
– 0.125µm CMOS
– SOI-compatible
– Cu interconnect
– low-k dielectrics
• Chip characteristics
– ~1.2V Vdd
– ~250 Million transistors
– ~1100 signal pins in flip chip packaging
16
Alpha 21464 (EV8) Processor
Architecture
• Enhanced out-of-order execution (that giant 2Bc-gskew
predictor we discussed before is here)
• Large on-chip L2 cache
• Direct RAMBUS interface
• On-chip router for system interconnect
• Glueless, directory-based, ccNUMA for up to 512-way SMP
• 8-wide superscalar
• 4-way simultaneous multithreading (SMT)
– Total die overhead ~ 6% (allegedly)
17
SMT Pipeline
Fetch Decode/
Map
Queue Reg
Read
Execute Dcache/
Store
Buffer
Reg
Write
Retire
Icache
Dcache
PC
Register
Map
Regs Regs
Source: A company once called Compaq
18
EV8 SMT
• In SMT mode, it is as if there are 4 processors on a chip that
shares their caches and TLB
• ReplicatedReplicated hardware contexts
– Program counter
– Architected registers (actually just the renaming table
since architected registers and rename registers come
from the same physical pool)
• SharedShared resources
– Rename register pool (larger than needed by 1 thread)
– Instruction queue
– Caches
– TLB
– Branch predictors
• Deceased before seeing the daylight.
19
Reality Check, circa 200x
• Conventional processor designs run out of steam
– Power wall (thermal)
– Complexity (verification)
– Physics (CMOS scaling)
1
10
100
1000
1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ 0.1µ 0.07µ
Watts/cm2
i386
i486
Pentium ® processor
Pentium Pro ® processor
Pentium II ® processor
Pentium III ® processor
Hot plateHot plate
Nuclear ReactorNuclear Reactor RocketRocket
NozzleNozzle
Sun’sSun’s
SurfaceSurface
1
10
100
1000
1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ 0.1µ 0.07µ
Watts/cm2
i386
i486
Pentium ® processor
Pentium Pro ® processor
Pentium II ® processor
Pentium III ® processor
Hot plateHot plate
Nuclear ReactorNuclear Reactor RocketRocket
NozzleNozzle
Sun’sSun’s
SurfaceSurface
“Surpassed hot-plate power
density in 0.5µm; Not too long
to reach nuclear reactor,”
Former Intel Fellow Fred
Pollack.
20
Latest Power Density Trend
Yeo and Lee, “Peeling the Power Onion of Data Centers,” In
Energy Efficient Thermal Management of Data Centers, Springer. To appear 2011
21
Reality Check, circa 200x
• Conventional processor designs run out of steam
– Power wall (thermal)
– Complexity (verification)
– Physics (CMOS scaling)
• Unanimous direction  Multi-core
– Simple cores (massive number)
– Keep
• Wire communication on leash
• Gordon Moore happy (Moore’s Law)
– Architects’ menace: kick the ball to the other side of the court?
• What do you (or your customers) want?
– Performance (and/or availability)
– Throughput > latency (turnaround time)
– Total cost of ownership (performance per dollar)
– Energy (performance per watt)
– Reliability and dependability, SPAM/spy free
22
Multi-core Processor Gala
23
Intel’s Multicore Roadmap
• To extend Moore’s Law
• To delay the ultimate limit of physics
• By 2010
– all Intel processors delivered will be multicore
– Intel’s 80-core processor (FPU array)
Source: Adapted from Tom’s Hardware
2006 20082007
SC 1MB
DC 2MB
DC 2/4MB
shared
DC 3 MB/6
MB shared
(45nm)
2006 20082007
DC 2/4MB
DC 2/4MB
shared
DC 4MB
DC 3MB /6MB
shared (45nm)
2006 20082007
DC 2MB
DC 4MB
DC 16MB
QC 4MB
QC 8/16MB
shared
8C 12MB
shared
(45nm)
SC 512KB/
1/ 2MB
8C 12MB
shared
(45nm)
Desktopprocessors
Mobileprocessors
Enterpriseprocessors
24
Is a Multi-core really better off?
Well, it is hard to say in Computing WorldWell, it is hard to say in Computing World
If you were plowing a field,
which would you rather use:
Two strong oxen or 1024 chickens?
--- Seymour Cray
25
Intel TeraFlops Research Prototype
• 2KB Data Memory
• 3KB Instruction Memory
• No coherence support
• 2 FMACs
• Next-gen had 3D-
integrated memory
– SRAM first
– Then DRAM
– Intel did not report
further result
Intel Single-chip Cloud Computer (SCC)
Scalable many-core architecture
• Dual-core (P54C x86) tile
• 24 “tiles”
Advanced power management
• Each tile can run at their
own frequency
• Groupings of 4 tiles can run
at their own voltage
• 25W to 125W
• 4 DDR3 controllers
• NoC
27
Georgia Tech 64-Core 3D-MAPS Many-Core Chip
Single Core
Single SRAM tile
• 3D-stacked many-core processor
• Fast, high-density face-to-face vias for high bandwidth
• Wafer-to-wafer bonding
• @277MHz, peak data B/W ~ 70.9GB/sec
Data SRAM
F2F via bus
2-way VLIW core
28
Is a Multi-core really better off?
DEEP BLUE
480 chess chips
Can evaluate 200,000,000 moves per second!!
29
IBM Watson Jeopardy! Competition (2011.2.)
• POWER7 chips (2,880 cores) + 16TB memory
• Massively parallel processing
• Combine: Processing power, Natural language processing,
AI, Search, Knowledge extraction
30
Major Challenges for Multi-Core Designs
• Communication
– Memory hierarchy
– Data allocation (you have a large shared L2/L3 now)
– Interconnection network
• AMD HyperTransport
• Intel QPI
– Scalability
– Bus Bandwidth, how to get there?
• Power-Performance — Win or lose?
– Borkar’s multicore arguments
• 15% per core performance drop  50% power saving
• Giant, single core wastes power when task is small
– How about leakage?
• Process variation and yield
• Programming Model
31
Intel Core 2 Duo
• Homogeneous cores
• Bus based on chip
interconnect
• Shared on-die Cache
Memory
• Traditional I/O
Classic OOO: Reservation Stations,
Issue ports, Schedulers…etc
Large, shared set associative, prefetch,
etc.
Source: Intel Corp.
32
Core 2 Duo Microarchitecture
33
Why Sharing on-die L2?
• What happens when L2 is too large?
34
Intel Core 2 Duo (Merom)
35
CoreTM
μArch — Wide Dynamic Execution
36
CoreTM
μArch — Wide Dynamic Execution
37
CoreTM
μArch — MACRO Fusion
• Common “Intel 32” instruction pairs are combined
• 4-1-1-1 decoder that sustains 7 μop’s per cycle
• 4+1 = 5 “Intel 32” instructions per cycle
38
Micro(-ops) Fusion (from Pentium M)
• A misnomer..
• Instead of breaking up an Intel32 instruction into μop, they decide not to
break it up…
• A better naming scheme would call the previous techniques — “IA32
fission”
• To fuse
– Store address and store data μops
– Load-and-op μops (e.g. ADD (%esp), %eax)
• Extend each RS entry to take 3 operands
• To reduce
– micro-ops (10% reduction in the OOO logic)
– Decoder bandwidth (simple decoder can decode fusion type
instruction)
– Energy consumption
• Performance improved by 5% for INT and 9% for FP (Pentium M data)
39
Smart Memory Access
40
Intel Quad-Core Processor
(Kentsfield, Clovertown)
Source: Intel
41
AMD Quad-Core Processor (Barcelona)
• True 128-bit SSE (as opposed 64 in prior Opteron)
• Sideband Stack optimizer
– Parallelize many POPes and PUSHes (which were dependent on each other)
• Convert them into pure loads/store instructions
– No uops in FUs for stack pointer adjustment
On different
power plane
from the cores
Source: AMD
42
Barcelona’s Cache Architecture
Source: AMD
43
Intel Penryn Dual-Core (First 45nm µprocessor)
• High K dielectric metal gate
• 47 new SSE4 ISA
• Up to 12MB L2
• > 3GHz
Source: Intel
44
Intel Arrandale Processor
• 32nm
• Unified 3MB L3
• Power sharing (Turbo Boost)
between cores and gfx via DFS
45
AMD 12-Core “Magny-Cours” Opteron
• 45nm
• 4 memory channels
46
Sun UltraSparc T1
• Eight cores, each 4-way threaded
• Fine-grained multithreading
– a thread-selection logic
• Take out threads that encounter
long latency events
– Round-robin cycle-by-cycle
– 4 threads in a group share a
processing pipeline (Sparc pipe)
• 1.2 GHz (90nm)
• In-order, 8 instructions per cycle (single
issue from each core)
• Caches
– 16K 4-way 32B L1-I
– 8K 4-way 16B L1-D
– Blocking cache (reason for MT)
– 4-banked 12-way 3MB L2 + 4
memory controllers. (shared by all)
– Data moved between the L2 and the
cores using an integrated crossbar
switch to provide high throughput
(200GB/s)
47
Sun UltraSparc T1
• Thread-select logic marks a thread inactive
based on
– Instruction type
•A predecode bit in the I-cache to indicate long-latency
instruction
– Misses
– Traps
– Resource conflicts
48
Sun UltraSparc T2
• A fatter version of T1
• 1.4GHz (65nm)
• 8 threads per core, 8 cores on-die
• 1 FPU per core (1 FPU per die in T1), 16 INT EU (8 in T1)
• L2 increased to 8-banked 16-way 4MB shared
• 8 stage integer pipeline ( as opposed to 6 for T1)
• 16 instructions per cycle
• One PCI Express port (x8 1.0)
• Two 10 Gigabit Ethernet ports with packet classification and filtering
• Eight encryption engines
• Four dual-channel FBDIMM memory controllers
• 711 signal I/O,1831 total
49
STI Cell Broadband Engine
• Heterogeneous!
• 9 cores, 10 threads
• 64-bit PowerPC
• Eight SPEs
– In-order, Dual-issue
– 128-bit SIMD
– 128x128b RF
– 256KB LS
– Fast Local SRAM
– Globally coherent
DMA (128B/cycle)
– 128+ concurrent
transactions to
memory per core
• High bandwidth
– EIB (96B/cycle)
50
Cell Chip Block Diagram
Synergistic
Memory flow
controller
BACKUP
52
Non-Uniform Cache Architecture
• ASPLOS 2002 proposed by UT-Austin
• Facts
– Large shared on-die L2
– Wire-delay dominating on-die cache
3 cycles
1MB
180nm, 1999
11 cycles
4MB
90nm, 2004
24 cycles
16MB
50nm, 2010
53
Multi-banked L2 cache
Bank=128KB
11 cycles
2MB @ 130nm
Bank Access time = 3 cycles
Interconnect delay = 8 cycles
54
Multi-banked L2 cache
Bank=64KB
47 cycles
16MB @ 50nm
Bank Access time = 3 cycles
Interconnect delay = 44 cycles
55
Static NUCA-1
• Use private per-bank channel
• Each bank has its distinct access latency
• Statically decide data location for its given address
• Average access latency =34.2 cycles
• Wire overhead = 20.9%  an issue
Tag
Array
Data
Bus
Address
Bus
Bank
Sub-bank
Predecoder
Sense
amplifier
Wordline driver
and decoder
56
Static NUCA-2
• Use a 2D switched network to alleviate wire area overhead
• Average access latency =24.2 cycles
• Wire overhead = 5.9%
Bank
Data
bus
Switch
Tag Array
Wordline driver
and decoder
Predecoder

Más contenido relacionado

La actualidad más candente

Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWLec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWHsien-Hsin Sean Lee, Ph.D.
 
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...Hsien-Hsin Sean Lee, Ph.D.
 
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...Hsien-Hsin Sean Lee, Ph.D.
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet FiltersKernel TLV
 
Linux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureLinux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureRyo Jin
 
Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...Marina Kolpakova
 
Why my network does not work? Networking Quiz 2017
Why my network does not work? Networking Quiz 2017Why my network does not work? Networking Quiz 2017
Why my network does not work? Networking Quiz 2017Andriy Berestovskyy
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
Arm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefingArm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefingMerck Hung
 
BPF - All your packets belong to me
BPF - All your packets belong to meBPF - All your packets belong to me
BPF - All your packets belong to me_xhr_
 
Code GPU with CUDA - Memory Subsystem
Code GPU with CUDA - Memory SubsystemCode GPU with CUDA - Memory Subsystem
Code GPU with CUDA - Memory SubsystemMarina Kolpakova
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerMarina Kolpakova
 
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler OptimizationsPragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler OptimizationsMarina Kolpakova
 
Code GPU with CUDA - Device code optimization principle
Code GPU with CUDA - Device code optimization principleCode GPU with CUDA - Device code optimization principle
Code GPU with CUDA - Device code optimization principleMarina Kolpakova
 
The Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerThe Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerSasha Goldshtein
 
RISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentorRISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentorRISC-V International
 

La actualidad más candente (20)

Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWLec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
 
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
 
ARM Architecture in Details
ARM Architecture in Details ARM Architecture in Details
ARM Architecture in Details
 
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
 
The Spectre of Meltdowns
The Spectre of MeltdownsThe Spectre of Meltdowns
The Spectre of Meltdowns
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
 
Linux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureLinux on ARM 64-bit Architecture
Linux on ARM 64-bit Architecture
 
Pipelining1
Pipelining1Pipelining1
Pipelining1
 
Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...
 
Why my network does not work? Networking Quiz 2017
Why my network does not work? Networking Quiz 2017Why my network does not work? Networking Quiz 2017
Why my network does not work? Networking Quiz 2017
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
Arm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefingArm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefing
 
BPF - All your packets belong to me
BPF - All your packets belong to meBPF - All your packets belong to me
BPF - All your packets belong to me
 
Code GPU with CUDA - Memory Subsystem
Code GPU with CUDA - Memory SubsystemCode GPU with CUDA - Memory Subsystem
Code GPU with CUDA - Memory Subsystem
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
 
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler OptimizationsPragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
 
Code GPU with CUDA - Device code optimization principle
Code GPU with CUDA - Device code optimization principleCode GPU with CUDA - Device code optimization principle
Code GPU with CUDA - Device code optimization principle
 
The Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerThe Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF Primer
 
ARM 64bit has come!
ARM 64bit has come!ARM 64bit has come!
ARM 64bit has come!
 
RISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentorRISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentor
 

Destacado

Lec14 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech --- Coherence
Lec14 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech --- CoherenceLec14 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech --- Coherence
Lec14 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech --- CoherenceHsien-Hsin Sean Lee, Ph.D.
 
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Hsien-Hsin Sean Lee, Ph.D.
 
Lec9 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part 1
Lec9 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part 1Lec9 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part 1
Lec9 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part 1Hsien-Hsin Sean Lee, Ph.D.
 
B sc cs i bo-de u-iii counters & registers
B sc cs i bo-de u-iii counters & registersB sc cs i bo-de u-iii counters & registers
B sc cs i bo-de u-iii counters & registersRai University
 
14827 shift registers
14827 shift registers14827 shift registers
14827 shift registersSandeep Kumar
 
2.3 sequantial logic circuit
2.3 sequantial logic circuit2.3 sequantial logic circuit
2.3 sequantial logic circuitWan Afirah
 
Overview of Shift register and applications
Overview of Shift register and applicationsOverview of Shift register and applications
Overview of Shift register and applicationsKarthik Kumar
 
Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...
Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...
Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...Hsien-Hsin Sean Lee, Ph.D.
 
Lec20 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Da...
Lec20 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Da...Lec20 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Da...
Lec20 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Da...Hsien-Hsin Sean Lee, Ph.D.
 
Lec14 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Se...
Lec14 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Se...Lec14 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Se...
Lec14 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Se...Hsien-Hsin Sean Lee, Ph.D.
 
Lec1 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Intro
Lec1 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- IntroLec1 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Intro
Lec1 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- IntroHsien-Hsin Sean Lee, Ph.D.
 
Lec6 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Can...
Lec6 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Can...Lec6 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Can...
Lec6 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Can...Hsien-Hsin Sean Lee, Ph.D.
 
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...Hsien-Hsin Sean Lee, Ph.D.
 

Destacado (20)

Lec14 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech --- Coherence
Lec14 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech --- CoherenceLec14 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech --- Coherence
Lec14 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech --- Coherence
 
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
 
Lec9 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part 1
Lec9 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part 1Lec9 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part 1
Lec9 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part 1
 
Semiconductor
SemiconductorSemiconductor
Semiconductor
 
Shift Register
Shift RegisterShift Register
Shift Register
 
B sc cs i bo-de u-iii counters & registers
B sc cs i bo-de u-iii counters & registersB sc cs i bo-de u-iii counters & registers
B sc cs i bo-de u-iii counters & registers
 
Digital 9 16
Digital 9 16Digital 9 16
Digital 9 16
 
digital Counter
digital Counterdigital Counter
digital Counter
 
Counter And Sequencer Design- Student
Counter And Sequencer Design- StudentCounter And Sequencer Design- Student
Counter And Sequencer Design- Student
 
14827 shift registers
14827 shift registers14827 shift registers
14827 shift registers
 
2.3 sequantial logic circuit
2.3 sequantial logic circuit2.3 sequantial logic circuit
2.3 sequantial logic circuit
 
Overview of Shift register and applications
Overview of Shift register and applicationsOverview of Shift register and applications
Overview of Shift register and applications
 
Shift Registers
Shift RegistersShift Registers
Shift Registers
 
Shift registers
Shift registersShift registers
Shift registers
 
Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...
Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...
Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...
 
Lec20 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Da...
Lec20 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Da...Lec20 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Da...
Lec20 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Da...
 
Lec14 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Se...
Lec14 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Se...Lec14 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Se...
Lec14 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Se...
 
Lec1 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Intro
Lec1 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- IntroLec1 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Intro
Lec1 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Intro
 
Lec6 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Can...
Lec6 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Can...Lec6 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Can...
Lec6 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Can...
 
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...
 

Similar a Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore

Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture Haris456
 
Automating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency SpreadsAutomating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency SpreadsScyllaDB
 
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...Databricks
 
Open stack in action cern _openstack_accelerating_science
Open stack in action  cern _openstack_accelerating_scienceOpen stack in action  cern _openstack_accelerating_science
Open stack in action cern _openstack_accelerating_scienceeNovance
 
Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLSpil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLThijs Terlouw
 
Nodes and Networks for HPC computing
Nodes and Networks for HPC computingNodes and Networks for HPC computing
Nodes and Networks for HPC computingrinnocente
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Andriy Berestovskyy
 
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchDPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchJim St. Leger
 
FPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchFPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchYutaka Yasuda
 
How to convert your Linux box into Security Gateway - Part 1
How to convert your Linux box into Security Gateway - Part 1How to convert your Linux box into Security Gateway - Part 1
How to convert your Linux box into Security Gateway - Part 1n|u - The Open Security Community
 
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesAMD
 
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...Tokyo Institute of Technology
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
Secure lustre on openstack
Secure lustre on openstackSecure lustre on openstack
Secure lustre on openstackJames Beal
 

Similar a Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore (20)

Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
 
Automating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency SpreadsAutomating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency Spreads
 
Protocol Independence
Protocol IndependenceProtocol Independence
Protocol Independence
 
L21.fa13
L21.fa13L21.fa13
L21.fa13
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
 
Open stack in action cern _openstack_accelerating_science
Open stack in action  cern _openstack_accelerating_scienceOpen stack in action  cern _openstack_accelerating_science
Open stack in action cern _openstack_accelerating_science
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLSpil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NL
 
Nodes and Networks for HPC computing
Nodes and Networks for HPC computingNodes and Networks for HPC computing
Nodes and Networks for HPC computing
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
 
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchDPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
 
FPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchFPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow Switch
 
How to convert your Linux box into Security Gateway - Part 1
How to convert your Linux box into Security Gateway - Part 1How to convert your Linux box into Security Gateway - Part 1
How to convert your Linux box into Security Gateway - Part 1
 
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
 
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
 
13 superscalar
13 superscalar13 superscalar
13 superscalar
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Secure lustre on openstack
Secure lustre on openstackSecure lustre on openstack
Secure lustre on openstack
 
13_Superscalar.ppt
13_Superscalar.ppt13_Superscalar.ppt
13_Superscalar.ppt
 

Más de Hsien-Hsin Sean Lee, Ph.D.

Lec18 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- In...
Lec18 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- In...Lec18 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- In...
Lec18 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- In...Hsien-Hsin Sean Lee, Ph.D.
 
Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...
Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...
Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...Hsien-Hsin Sean Lee, Ph.D.
 
Lec15 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Re...
Lec15 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Re...Lec15 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Re...
Lec15 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Re...Hsien-Hsin Sean Lee, Ph.D.
 
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...Hsien-Hsin Sean Lee, Ph.D.
 
Lec12 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Ad...
Lec12 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Ad...Lec12 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Ad...
Lec12 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Ad...Hsien-Hsin Sean Lee, Ph.D.
 
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...Hsien-Hsin Sean Lee, Ph.D.
 
Lec9 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Com...
Lec9 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Com...Lec9 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Com...
Lec9 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Com...Hsien-Hsin Sean Lee, Ph.D.
 
Lec8 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Qui...
Lec8 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Qui...Lec8 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Qui...
Lec8 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Qui...Hsien-Hsin Sean Lee, Ph.D.
 
Lec7 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Kar...
Lec7 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Kar...Lec7 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Kar...
Lec7 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Kar...Hsien-Hsin Sean Lee, Ph.D.
 
Lec5 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Boo...
Lec5 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Boo...Lec5 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Boo...
Lec5 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Boo...Hsien-Hsin Sean Lee, Ph.D.
 
Lec4 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMOS
Lec4 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMOSLec4 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMOS
Lec4 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMOSHsien-Hsin Sean Lee, Ph.D.
 
Lec3 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMO...
Lec3 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMO...Lec3 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMO...
Lec3 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMO...Hsien-Hsin Sean Lee, Ph.D.
 

Más de Hsien-Hsin Sean Lee, Ph.D. (12)

Lec18 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- In...
Lec18 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- In...Lec18 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- In...
Lec18 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- In...
 
Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...
Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...
Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...
 
Lec15 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Re...
Lec15 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Re...Lec15 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Re...
Lec15 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Re...
 
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
 
Lec12 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Ad...
Lec12 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Ad...Lec12 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Ad...
Lec12 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Ad...
 
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
 
Lec9 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Com...
Lec9 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Com...Lec9 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Com...
Lec9 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Com...
 
Lec8 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Qui...
Lec8 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Qui...Lec8 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Qui...
Lec8 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Qui...
 
Lec7 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Kar...
Lec7 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Kar...Lec7 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Kar...
Lec7 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Kar...
 
Lec5 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Boo...
Lec5 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Boo...Lec5 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Boo...
Lec5 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Boo...
 
Lec4 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMOS
Lec4 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMOSLec4 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMOS
Lec4 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMOS
 
Lec3 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMO...
Lec3 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMO...Lec3 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMO...
Lec3 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMO...
 

Último

定制(Salford学位证)索尔福德大学毕业证成绩单原版一比一
定制(Salford学位证)索尔福德大学毕业证成绩单原版一比一定制(Salford学位证)索尔福德大学毕业证成绩单原版一比一
定制(Salford学位证)索尔福德大学毕业证成绩单原版一比一ss ss
 
the cOMPUTER SYSTEM - computer hardware servicing.pptx
the cOMPUTER SYSTEM - computer hardware servicing.pptxthe cOMPUTER SYSTEM - computer hardware servicing.pptx
the cOMPUTER SYSTEM - computer hardware servicing.pptxLeaMaePahinagGarciaV
 
NO1 Certified Vashikaran Specialist in Uk Black Magic Specialist in Uk Black ...
NO1 Certified Vashikaran Specialist in Uk Black Magic Specialist in Uk Black ...NO1 Certified Vashikaran Specialist in Uk Black Magic Specialist in Uk Black ...
NO1 Certified Vashikaran Specialist in Uk Black Magic Specialist in Uk Black ...Amil baba
 
Call Girls In Munirka>༒9599632723 Incall_OutCall Available
Call Girls In Munirka>༒9599632723 Incall_OutCall AvailableCall Girls In Munirka>༒9599632723 Incall_OutCall Available
Call Girls In Munirka>༒9599632723 Incall_OutCall AvailableCall Girls in Delhi
 
Call Girls Delhi {Rohini} 9711199012 high profile service
Call Girls Delhi {Rohini} 9711199012 high profile serviceCall Girls Delhi {Rohini} 9711199012 high profile service
Call Girls Delhi {Rohini} 9711199012 high profile servicerehmti665
 
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝soniya singh
 
毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...
毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...
毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...ttt fff
 
Hifi Babe North Delhi Call Girl Service Fun Tonight
Hifi Babe North Delhi Call Girl Service Fun TonightHifi Babe North Delhi Call Girl Service Fun Tonight
Hifi Babe North Delhi Call Girl Service Fun TonightKomal Khan
 
(办理学位证)多伦多大学毕业证成绩单原版一比一
(办理学位证)多伦多大学毕业证成绩单原版一比一(办理学位证)多伦多大学毕业证成绩单原版一比一
(办理学位证)多伦多大学毕业证成绩单原版一比一C SSS
 
毕业文凭制作#回国入职#diploma#degree加拿大瑞尔森大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree加拿大瑞尔森大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree 毕业文凭制作#回国入职#diploma#degree加拿大瑞尔森大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree加拿大瑞尔森大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree z zzz
 
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCRReal Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCRdollysharma2066
 
Papular No 1 Online Istikhara Amil Baba Pakistan Amil Baba In Karachi Amil B...
Papular No 1 Online Istikhara Amil Baba Pakistan  Amil Baba In Karachi Amil B...Papular No 1 Online Istikhara Amil Baba Pakistan  Amil Baba In Karachi Amil B...
Papular No 1 Online Istikhara Amil Baba Pakistan Amil Baba In Karachi Amil B...Authentic No 1 Amil Baba In Pakistan
 
Dubai Call Girls O525547819 Spring Break Fast Call Girls Dubai
Dubai Call Girls O525547819 Spring Break Fast Call Girls DubaiDubai Call Girls O525547819 Spring Break Fast Call Girls Dubai
Dubai Call Girls O525547819 Spring Break Fast Call Girls Dubaikojalkojal131
 
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...Amil Baba Dawood bangali
 
办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一
办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一
办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一diploma 1
 
vip Krishna Nagar Call Girls 9999965857 Call or WhatsApp Now Book
vip Krishna Nagar Call Girls 9999965857 Call or WhatsApp Now Bookvip Krishna Nagar Call Girls 9999965857 Call or WhatsApp Now Book
vip Krishna Nagar Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degreeyuu sss
 
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)861c7ca49a02
 

Último (20)

定制(Salford学位证)索尔福德大学毕业证成绩单原版一比一
定制(Salford学位证)索尔福德大学毕业证成绩单原版一比一定制(Salford学位证)索尔福德大学毕业证成绩单原版一比一
定制(Salford学位证)索尔福德大学毕业证成绩单原版一比一
 
the cOMPUTER SYSTEM - computer hardware servicing.pptx
the cOMPUTER SYSTEM - computer hardware servicing.pptxthe cOMPUTER SYSTEM - computer hardware servicing.pptx
the cOMPUTER SYSTEM - computer hardware servicing.pptx
 
NO1 Certified Vashikaran Specialist in Uk Black Magic Specialist in Uk Black ...
NO1 Certified Vashikaran Specialist in Uk Black Magic Specialist in Uk Black ...NO1 Certified Vashikaran Specialist in Uk Black Magic Specialist in Uk Black ...
NO1 Certified Vashikaran Specialist in Uk Black Magic Specialist in Uk Black ...
 
Call Girls In Munirka>༒9599632723 Incall_OutCall Available
Call Girls In Munirka>༒9599632723 Incall_OutCall AvailableCall Girls In Munirka>༒9599632723 Incall_OutCall Available
Call Girls In Munirka>༒9599632723 Incall_OutCall Available
 
Call Girls Delhi {Rohini} 9711199012 high profile service
Call Girls Delhi {Rohini} 9711199012 high profile serviceCall Girls Delhi {Rohini} 9711199012 high profile service
Call Girls Delhi {Rohini} 9711199012 high profile service
 
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
 
young call girls in Gtb Nagar,🔝 9953056974 🔝 escort Service
young call girls in Gtb Nagar,🔝 9953056974 🔝 escort Serviceyoung call girls in Gtb Nagar,🔝 9953056974 🔝 escort Service
young call girls in Gtb Nagar,🔝 9953056974 🔝 escort Service
 
毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...
毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...
毕业文凭制作#回国入职#diploma#degree美国威斯康星大学麦迪逊分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#d...
 
Hifi Babe North Delhi Call Girl Service Fun Tonight
Hifi Babe North Delhi Call Girl Service Fun TonightHifi Babe North Delhi Call Girl Service Fun Tonight
Hifi Babe North Delhi Call Girl Service Fun Tonight
 
(办理学位证)多伦多大学毕业证成绩单原版一比一
(办理学位证)多伦多大学毕业证成绩单原版一比一(办理学位证)多伦多大学毕业证成绩单原版一比一
(办理学位证)多伦多大学毕业证成绩单原版一比一
 
毕业文凭制作#回国入职#diploma#degree加拿大瑞尔森大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree加拿大瑞尔森大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree 毕业文凭制作#回国入职#diploma#degree加拿大瑞尔森大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree加拿大瑞尔森大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCRReal Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
 
Papular No 1 Online Istikhara Amil Baba Pakistan Amil Baba In Karachi Amil B...
Papular No 1 Online Istikhara Amil Baba Pakistan  Amil Baba In Karachi Amil B...Papular No 1 Online Istikhara Amil Baba Pakistan  Amil Baba In Karachi Amil B...
Papular No 1 Online Istikhara Amil Baba Pakistan Amil Baba In Karachi Amil B...
 
Dubai Call Girls O525547819 Spring Break Fast Call Girls Dubai
Dubai Call Girls O525547819 Spring Break Fast Call Girls DubaiDubai Call Girls O525547819 Spring Break Fast Call Girls Dubai
Dubai Call Girls O525547819 Spring Break Fast Call Girls Dubai
 
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
NO1 Certified Black Magic Specialist Expert In Bahawalpur, Sargodha, Sialkot,...
 
办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一
办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一
办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一
 
vip Krishna Nagar Call Girls 9999965857 Call or WhatsApp Now Book
vip Krishna Nagar Call Girls 9999965857 Call or WhatsApp Now Bookvip Krishna Nagar Call Girls 9999965857 Call or WhatsApp Now Book
vip Krishna Nagar Call Girls 9999965857 Call or WhatsApp Now Book
 
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
 
Low rate Call girls in Delhi Justdial | 9953330565
Low rate Call girls in Delhi Justdial | 9953330565Low rate Call girls in Delhi Justdial | 9953330565
Low rate Call girls in Delhi Justdial | 9953330565
 
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
 

Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore

  • 1. ECE 4100/6100 Advanced Computer Architecture Lecture 13 Multithreading and Multicore Processors Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia Institute of Technology
  • 2. 2 TLP • ILP of a single program is hard – Large ILP is Far-flung – We are human after all, program w/ sequential mind • Reality: running multiple threads or programs • Thread Level Parallelism – Time Multiplexing – Throughput computing – Multiple program workloads – Multiple concurrent threads – Helper threads to improve single program performance
  • 3. 3 Multi-Tasking Paradigm • Virtual memory makes it easy • Context switch could be expensive or requires extra HW – VIVT cache – VIPT cache – TLBs Thread 1Thread 1 UnusedUnused ExecutionTimeQuantumExecutionTimeQuantum FU1FU1 FU2FU2 FU3FU3 FU4FU4 ConventionalConventional SuperscalarSuperscalar SingleSingle ThreadedThreaded Thread 2Thread 2 Thread 3Thread 3 Thread 4Thread 4 Thread 5Thread 5
  • 4. 4 Multi-threading Paradigm Thread 1Thread 1 UnusedUnused ExecutionTimeExecutionTime FU1FU1 FU2FU2 FU3FU3 FU4FU4 ConventionalConventional SuperscalarSuperscalar SingleSingle ThreadedThreaded SimultaneousSimultaneous MultithreadingMultithreading (SMT)(SMT) Fine-grainedFine-grained MultithreadingMultithreading (cycle-by-cycle(cycle-by-cycle Interleaving)Interleaving) Thread 2Thread 2 Thread 3Thread 3 Thread 4Thread 4 Thread 5Thread 5 Coarse-grainedCoarse-grained MultithreadingMultithreading (Block Interleaving)(Block Interleaving) ChipChip MultiprocessorMultiprocessor (CMP or(CMP or MultiCore)MultiCore)
  • 5. 5 Conventional Multithreading • Zero-overhead context switch • Duplicated contexts for threads 0:r0 0:r7 1:r0 1:r7 2:r0 2:r7 3:r0 3:r7 CtxtPtr Memory (shared by threads) Register file
  • 6. 6 Cycle Interleaving MT • Per-cycle, Per-thread instruction fetching • Examples: HEP, Horizon, Tera MTA, MIT M- machine • Interesting questions to consider – Does it need a sophisticated branch predictor? – Or does it need any speculative execution at all? •Get rid of “branch predictionbranch prediction”? •Get rid of “predicationpredication”? – Does it need any out-of-order execution capability?
  • 7. 7 Tera Multi-Threaded Architecture • Cycle-by-cycle interleaving • MTA can context-switch every cycle (3ns) • As many as 128 distinct threads (hiding 384ns) • 3-wide VLIW instruction format (M+ALU+ALU/Br) • Each instruction has 3-bit for dependence lookahead – Determine if there is dependency with subsequent instructions – Execute up to 7 future VLIW instructions (before switch) Loop: nop r1=r2+r3 r5=r6+4 lookahead=1 nop r8=r9-r10 r11=r12-r13 lookahead=2 [r5]=r1 r4=r4-1 bnz Loop lookahead=0
  • 8. 8 Block Interleaving MT • Context switch on a specific event (dynamic pipelining) – Explicit switching: implementing a switchswitch instruction – Implicit switching: trigger when a specific instruction class fetched • Static switching (switch upon fetching) – Switch-on-memory-instructions: Rhamma processor – Switch-on-branch or switch-on-hard-to-predict-branch – Trigger can be implicit or explicit instruction • Dynamic switching – Switch-on-cache-miss (switch in later pipeline stage): MIT Sparcle (MIT Alewife’s node), Rhamma Processor – Switch-on-use (lazy strategy of switch-on-cache-miss) • Wait until last minute • Valid bit needed for each register – Clear when load issued, set when data returned – Switch-on-signal (e.g. interrupt) – Predicated switch instruction based on conditions • No need to support a large number of threads
  • 9. NVidia Fermi GPGPU Architecture
  • 10. Nvidia’s Streaming Multiprocessor (SM) • SIMD execution model • Issue one instruction from each warp to 16 CUDA cores • One warp = 32 parallel threads • Compute capability 2.0 allows 1536 resident threads (i.e., 48 warps) in one SM
  • 11. 11 RegisterRegister RRenameenamerr RegisterRegister RRenameenamerr RegisterRegister RRenameenamerr RegisterRegister RRenameenamerr RegisterRegister RRenameenamerr RegisterRegister RRenameenamerr RegisterRegister RRenameenamerr RegisterRegister RRenameenamerr RegisterRegister RRenameenamerr RegisterRegister RRenameenamerr Simultaneous Multithreading (SMT)• SMT name first used by UW; Earlier versions from UCSB [Nemirovsky, HICSS‘91] and [Hirata et al., ISCA-92] • Intel’s HyperThreading (2-way SMT) • IBM Power7 (4/6/8 cores, 4-way SMT); IBM Power5/6 (2 cores. Each 2-way SMT, 4 chips per package) : Power5 has OoO cores, Power6 In-order cores; • Basic ideas: Conventional MT + Simultaneous issue + Sharing common resources RegReg FileFile FMultFMult (4 cyc(4 cyclesles)) FAddFAdd (2 cyc)(2 cyc) ALU1ALU1ALU2ALU2 Load/StoreLoad/Store (variable)(variable) Fdiv, unpipeFdiv, unpipe (16 cyc(16 cyclesles)) RS & ROBRS & ROB plusplus PhysicalPhysical RegisterRegister FileFile RS & ROBRS & ROB plusplus PhysicalPhysical RegisterRegister FileFile FetchFetch UnitUnit FetchFetch UnitUnit PCPCPCPCPCPCPCPCPCPCPCPCPCPCPCPC I-CACHEI-CACHEI-CACHEI-CACHE DecodeDecodeDecodeDecode RegisterRegister RRenameenamerr RegisterRegister RRenameenamerr RegReg FileFile RegReg FileFile RegReg FileFile RegReg FileFile RegReg FileFile RegReg FileFile Reg File DD-CACHE-CACHEDD-CACHE-CACHE RegisterRegister RRenameenamerr RegisterRegister RRenameenamerr RegisterRegister RRenameenamerr RegisterRegister RRenameenamerr
  • 12. 12 Instruction Fetching Policy • FIFO, Round Robin, simple but may be too naive • Adaptive Fetching Policies – BRCOUNT (reduce wrong path issuing) • Count # of br inst in decode/rename/IQ stages • Give top priority to thread with the least BRCOUNT – MISSCOUT (reduce IQ clog) • Count # of outstanding D-cache misses • Give top priority to thread with the least MISSCOUNT – ICOUNT (reduce IQ clog) • Count # of inst in decode/rename/IQ stages • Give top priority to thread with the least ICOUNT – IQPOSN (reduce IQ clog) • Give lowest priority to those threads with inst closest to the head of INT or FP instruction queues – Due to that threads with the oldest instructions will be most prone to IQ clog • No Counter needed
  • 13. 13 Resource Sharing • Could be tricky when threads compete for the resources • Static – Less complexity – Could penalize threads (e.g. instruction window size) – P4’s Hyperthreading • Dynamic – Complex – What is fair? How to quantify fairness? • A growing concern in Multi-core processors – Shared L2, Bus bandwidth, etc. – Issues • Fairness • Mutual thrashing
  • 14. 14 P4 HyperThreading Resource Partitioning • TC (or UROM) is alternatively accessed per cycle for each logical processor unless one is stalled due to TC miss ∀ µop queue (into ½) after fetched from TC • ROB (126/2) • LB (48/2) • SB (24/2) (32/2 for Prescott) • General µop queue and memory µop queue (1/2) • TLB (½?) as there is no PID • Retirement: alternating between 2 logical processors
  • 15. 15 Alpha 21464 (EV8) Processor Technology • Leading edge process technology – 1.2 ~ 2.0GHz – 0.125µm CMOS – SOI-compatible – Cu interconnect – low-k dielectrics • Chip characteristics – ~1.2V Vdd – ~250 Million transistors – ~1100 signal pins in flip chip packaging
  • 16. 16 Alpha 21464 (EV8) Processor Architecture • Enhanced out-of-order execution (that giant 2Bc-gskew predictor we discussed before is here) • Large on-chip L2 cache • Direct RAMBUS interface • On-chip router for system interconnect • Glueless, directory-based, ccNUMA for up to 512-way SMP • 8-wide superscalar • 4-way simultaneous multithreading (SMT) – Total die overhead ~ 6% (allegedly)
  • 17. 17 SMT Pipeline Fetch Decode/ Map Queue Reg Read Execute Dcache/ Store Buffer Reg Write Retire Icache Dcache PC Register Map Regs Regs Source: A company once called Compaq
  • 18. 18 EV8 SMT • In SMT mode, it is as if there are 4 processors on a chip that shares their caches and TLB • ReplicatedReplicated hardware contexts – Program counter – Architected registers (actually just the renaming table since architected registers and rename registers come from the same physical pool) • SharedShared resources – Rename register pool (larger than needed by 1 thread) – Instruction queue – Caches – TLB – Branch predictors • Deceased before seeing the daylight.
  • 19. 19 Reality Check, circa 200x • Conventional processor designs run out of steam – Power wall (thermal) – Complexity (verification) – Physics (CMOS scaling) 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ 0.1µ 0.07µ Watts/cm2 i386 i486 Pentium ® processor Pentium Pro ® processor Pentium II ® processor Pentium III ® processor Hot plateHot plate Nuclear ReactorNuclear Reactor RocketRocket NozzleNozzle Sun’sSun’s SurfaceSurface 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ 0.1µ 0.07µ Watts/cm2 i386 i486 Pentium ® processor Pentium Pro ® processor Pentium II ® processor Pentium III ® processor Hot plateHot plate Nuclear ReactorNuclear Reactor RocketRocket NozzleNozzle Sun’sSun’s SurfaceSurface “Surpassed hot-plate power density in 0.5µm; Not too long to reach nuclear reactor,” Former Intel Fellow Fred Pollack.
  • 20. 20 Latest Power Density Trend Yeo and Lee, “Peeling the Power Onion of Data Centers,” In Energy Efficient Thermal Management of Data Centers, Springer. To appear 2011
  • 21. 21 Reality Check, circa 200x • Conventional processor designs run out of steam – Power wall (thermal) – Complexity (verification) – Physics (CMOS scaling) • Unanimous direction  Multi-core – Simple cores (massive number) – Keep • Wire communication on leash • Gordon Moore happy (Moore’s Law) – Architects’ menace: kick the ball to the other side of the court? • What do you (or your customers) want? – Performance (and/or availability) – Throughput > latency (turnaround time) – Total cost of ownership (performance per dollar) – Energy (performance per watt) – Reliability and dependability, SPAM/spy free
  • 23. 23 Intel’s Multicore Roadmap • To extend Moore’s Law • To delay the ultimate limit of physics • By 2010 – all Intel processors delivered will be multicore – Intel’s 80-core processor (FPU array) Source: Adapted from Tom’s Hardware 2006 20082007 SC 1MB DC 2MB DC 2/4MB shared DC 3 MB/6 MB shared (45nm) 2006 20082007 DC 2/4MB DC 2/4MB shared DC 4MB DC 3MB /6MB shared (45nm) 2006 20082007 DC 2MB DC 4MB DC 16MB QC 4MB QC 8/16MB shared 8C 12MB shared (45nm) SC 512KB/ 1/ 2MB 8C 12MB shared (45nm) Desktopprocessors Mobileprocessors Enterpriseprocessors
  • 24. 24 Is a Multi-core really better off? Well, it is hard to say in Computing WorldWell, it is hard to say in Computing World If you were plowing a field, which would you rather use: Two strong oxen or 1024 chickens? --- Seymour Cray
  • 25. 25 Intel TeraFlops Research Prototype • 2KB Data Memory • 3KB Instruction Memory • No coherence support • 2 FMACs • Next-gen had 3D- integrated memory – SRAM first – Then DRAM – Intel did not report further result
  • 26. Intel Single-chip Cloud Computer (SCC) Scalable many-core architecture • Dual-core (P54C x86) tile • 24 “tiles” Advanced power management • Each tile can run at their own frequency • Groupings of 4 tiles can run at their own voltage • 25W to 125W • 4 DDR3 controllers • NoC
  • 27. 27 Georgia Tech 64-Core 3D-MAPS Many-Core Chip Single Core Single SRAM tile • 3D-stacked many-core processor • Fast, high-density face-to-face vias for high bandwidth • Wafer-to-wafer bonding • @277MHz, peak data B/W ~ 70.9GB/sec Data SRAM F2F via bus 2-way VLIW core
  • 28. 28 Is a Multi-core really better off? DEEP BLUE 480 chess chips Can evaluate 200,000,000 moves per second!!
  • 29. 29 IBM Watson Jeopardy! Competition (2011.2.) • POWER7 chips (2,880 cores) + 16TB memory • Massively parallel processing • Combine: Processing power, Natural language processing, AI, Search, Knowledge extraction
  • 30. 30 Major Challenges for Multi-Core Designs • Communication – Memory hierarchy – Data allocation (you have a large shared L2/L3 now) – Interconnection network • AMD HyperTransport • Intel QPI – Scalability – Bus Bandwidth, how to get there? • Power-Performance — Win or lose? – Borkar’s multicore arguments • 15% per core performance drop  50% power saving • Giant, single core wastes power when task is small – How about leakage? • Process variation and yield • Programming Model
  • 31. 31 Intel Core 2 Duo • Homogeneous cores • Bus based on chip interconnect • Shared on-die Cache Memory • Traditional I/O Classic OOO: Reservation Stations, Issue ports, Schedulers…etc Large, shared set associative, prefetch, etc. Source: Intel Corp.
  • 32. 32 Core 2 Duo Microarchitecture
  • 33. 33 Why Sharing on-die L2? • What happens when L2 is too large?
  • 34. 34 Intel Core 2 Duo (Merom)
  • 35. 35 CoreTM μArch — Wide Dynamic Execution
  • 36. 36 CoreTM μArch — Wide Dynamic Execution
  • 37. 37 CoreTM μArch — MACRO Fusion • Common “Intel 32” instruction pairs are combined • 4-1-1-1 decoder that sustains 7 μop’s per cycle • 4+1 = 5 “Intel 32” instructions per cycle
  • 38. 38 Micro(-ops) Fusion (from Pentium M) • A misnomer.. • Instead of breaking up an Intel32 instruction into μop, they decide not to break it up… • A better naming scheme would call the previous techniques — “IA32 fission” • To fuse – Store address and store data μops – Load-and-op μops (e.g. ADD (%esp), %eax) • Extend each RS entry to take 3 operands • To reduce – micro-ops (10% reduction in the OOO logic) – Decoder bandwidth (simple decoder can decode fusion type instruction) – Energy consumption • Performance improved by 5% for INT and 9% for FP (Pentium M data)
  • 40. 40 Intel Quad-Core Processor (Kentsfield, Clovertown) Source: Intel
  • 41. 41 AMD Quad-Core Processor (Barcelona) • True 128-bit SSE (as opposed 64 in prior Opteron) • Sideband Stack optimizer – Parallelize many POPes and PUSHes (which were dependent on each other) • Convert them into pure loads/store instructions – No uops in FUs for stack pointer adjustment On different power plane from the cores Source: AMD
  • 43. 43 Intel Penryn Dual-Core (First 45nm µprocessor) • High K dielectric metal gate • 47 new SSE4 ISA • Up to 12MB L2 • > 3GHz Source: Intel
  • 44. 44 Intel Arrandale Processor • 32nm • Unified 3MB L3 • Power sharing (Turbo Boost) between cores and gfx via DFS
  • 45. 45 AMD 12-Core “Magny-Cours” Opteron • 45nm • 4 memory channels
  • 46. 46 Sun UltraSparc T1 • Eight cores, each 4-way threaded • Fine-grained multithreading – a thread-selection logic • Take out threads that encounter long latency events – Round-robin cycle-by-cycle – 4 threads in a group share a processing pipeline (Sparc pipe) • 1.2 GHz (90nm) • In-order, 8 instructions per cycle (single issue from each core) • Caches – 16K 4-way 32B L1-I – 8K 4-way 16B L1-D – Blocking cache (reason for MT) – 4-banked 12-way 3MB L2 + 4 memory controllers. (shared by all) – Data moved between the L2 and the cores using an integrated crossbar switch to provide high throughput (200GB/s)
  • 47. 47 Sun UltraSparc T1 • Thread-select logic marks a thread inactive based on – Instruction type •A predecode bit in the I-cache to indicate long-latency instruction – Misses – Traps – Resource conflicts
  • 48. 48 Sun UltraSparc T2 • A fatter version of T1 • 1.4GHz (65nm) • 8 threads per core, 8 cores on-die • 1 FPU per core (1 FPU per die in T1), 16 INT EU (8 in T1) • L2 increased to 8-banked 16-way 4MB shared • 8 stage integer pipeline ( as opposed to 6 for T1) • 16 instructions per cycle • One PCI Express port (x8 1.0) • Two 10 Gigabit Ethernet ports with packet classification and filtering • Eight encryption engines • Four dual-channel FBDIMM memory controllers • 711 signal I/O,1831 total
  • 49. 49 STI Cell Broadband Engine • Heterogeneous! • 9 cores, 10 threads • 64-bit PowerPC • Eight SPEs – In-order, Dual-issue – 128-bit SIMD – 128x128b RF – 256KB LS – Fast Local SRAM – Globally coherent DMA (128B/cycle) – 128+ concurrent transactions to memory per core • High bandwidth – EIB (96B/cycle)
  • 50. 50 Cell Chip Block Diagram Synergistic Memory flow controller
  • 52. 52 Non-Uniform Cache Architecture • ASPLOS 2002 proposed by UT-Austin • Facts – Large shared on-die L2 – Wire-delay dominating on-die cache 3 cycles 1MB 180nm, 1999 11 cycles 4MB 90nm, 2004 24 cycles 16MB 50nm, 2010
  • 53. 53 Multi-banked L2 cache Bank=128KB 11 cycles 2MB @ 130nm Bank Access time = 3 cycles Interconnect delay = 8 cycles
  • 54. 54 Multi-banked L2 cache Bank=64KB 47 cycles 16MB @ 50nm Bank Access time = 3 cycles Interconnect delay = 44 cycles
  • 55. 55 Static NUCA-1 • Use private per-bank channel • Each bank has its distinct access latency • Statically decide data location for its given address • Average access latency =34.2 cycles • Wire overhead = 20.9%  an issue Tag Array Data Bus Address Bus Bank Sub-bank Predecoder Sense amplifier Wordline driver and decoder
  • 56. 56 Static NUCA-2 • Use a 2D switched network to alleviate wire area overhead • Average access latency =24.2 cycles • Wire overhead = 5.9% Bank Data bus Switch Tag Array Wordline driver and decoder Predecoder