SlideShare una empresa de Scribd logo
1 de 44
Descargar para leer sin conexión
CMPN301: Computer Architecture
Pipelining
Mayada Hadhoud
Computer Engineering Department
Cairo University
Agenda
• What is pipelining?
• Characteristics of pipelining
• Pipelining Hazards
– Structural Hazard
– Data Hazard
– Control Hazard
ENGR9861 Winter 2007 RV
What Is A Pipeline?
• Pipelining is used by virtually all modern
microprocessors to enhance performance by
overlapping the execution of instructions.
4
What Is Pipelining
• Laundry Example
• 4 persons each have one load of
clothes to wash, dry, and fold
• Washer takes 30 minutes
• Dryer takes 40 minutes
• “Folder” takes 20 minutes
A B C D
5
What Is Pipelining
Sequential laundry takes 6 hours for 4 loads
If they learned pipelining, how long would laundry take?
A
B
C
D
30 40 20 30 40 20 30 40 20 30 40 20
6 PM 7 8 9 10 11 Midnight
T
a
s
k
O
r
d
e
r
Time
Appendix A - Pipelining 6
What Is Pipelining
Start work ASAP
• Pipelined laundry takes 3.5
hours for 4 loads
A
B
C
D
6 PM 7 8 9 10 11 Midnight
T
a
s
k
O
r
d
e
r
Time
30 40 40 40 40 20
Appendix A - Pipelining 7
Pipelining Lessons
• Pipelining doesn’t help latency of
single task, it helps throughput
of entire workload
• Pipeline rate limited by slowest
pipeline stage
• Multiple tasks operating
simultaneously
• Potential speedup = Number
pipe stages
• Unbalanced lengths of pipe
stages reduces speedup
• Time to “fill” pipeline and time
to “drain
A
B
C
D
6 PM 7 8 9
T
a
s
k
O
r
d
e
r
Time
30 40 40 40 40 20
What Is
Pipelining
Pipelining Theoretical
Performance
• An ideal pipeline divides a task into k independent
sequential subtasks
– Each subtask requires 1 time unit to complete
– The task itself requires k time units to complete
• For n iterations of task, the execution times:
– With no pipelining: nk time units
– With pipelining: k + (n-1) time units
• Speedup of a k-stage pipeline is
– S = nk/[k+(n-1)] → = k for large n
Characteristics Of Pipelining
• The previous expression is ideal.
• In terms of a CPU, the implementation of
pipelining has the effect of reducing the
average instruction time, therefore reducing
the average CPI.
• EX: If each instruction in a microprocessor
takes 5 clock cycles (unpipelined) and we have
a 4 stage pipeline, the ideal average CPI with
the pipeline will be 1.25 .
RISC Instruction Set Basics (MIPS)
• Properties of RISC architectures:
– All operations on data apply to data in registers
and typically change the entire register (32-bits or
64-bits).
– The only operations that affect memory are
load/store operations. Memory to register and
register to memory.
– Usually, instructions are few and are typically one
size.
• ALU Instructions (R-type):
• Arithmetic operations, take two registers as operands.
The result is stored in a third register.
• Logical operations AND OR, XOR, shift
RISC Instruction Set Basics (MIPS)
Types of Instructions
R-Type Instruction Example
Immediate Format Instructions (I-type):
• Usually take a register (base register) as an operand and
a 16-bit immediate value. The sum of the two will
create the effective address. A second register acts as a
source in the case of a load operation.
• In the case of a store operation the second register
contains the data to be stored.
RISC Instruction Set Basics (MIPS)
Types of Instructions
I-Type Instruction Example
Jump Format (J-type)
• Conditional branches are transfers of control. As
described before, a branch causes an immediate value
to be added to the current program counter.
RISC Instruction Set Basics (MIPS)
Types of Instructions
RISC Instruction Set Implementation
• We first need to look at how instructions in the MIPS instruction
set are implemented without pipelining. We’ll assume that any
instruction of the subset of MIPS can be executed in at most 5
clock cycles.
• The five clock cycles will be broken up into the following steps:
• Instruction Fetch Cycle
• Instruction Decode/Register Fetch Cycle
• Execution Cycle
• Memory Access Cycle
• Write- Back
Fetching Instructions (IF)
• Fetching instructions involves
– reading the instruction from
the Instruction Memory
– updating the PC to hold the
address of the next
instruction
– PC is updated every cycle, so
it does not need an explicit
write control signal
– Instruction Memory is read
every cycle, so it doesn’t need
an explicit read control signal
Read
Address
Instruction
Instruction
Memory
Add
PC
4
Decoding Instructions (ID)
• Decoding instructions involves
– sending the fetched instruction’s opcode and
function field bits to the control unit
– reading two values from the Register File
• Register File addresses are contained in the instruction
Instruction
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
Control
Unit
Executing R Format Operations (IE)
• R format operations
(add,sub,slt,and,or)
– perform the (op and funct) operation on values in rs and rt
– store the result back into the Register File (into location rd)
– The Register File is not written every cycle (e.g. sw), so we need an
explicit write control signal for the Register File
R-type:
31 25 20 15 5 0
op rs rt rd funct
shamt
10
Instruction
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
overflow
zero
ALU control
RegWrite
Executing Load and Store Operations (IE)
• Load and store operations involve
– compute memory address by adding the base register (read from the Register File during
decode) to the 16-bit signed-extended offset field in the instruction
– store value (read from the Register File during decode) written to the Data Memory
– load value, read from the Data Memory, written to the Register File
Instruction
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
overflow
zero
ALU control
RegWrite
Data
Memory
Address
Write Data
Read Data
Sign
Extend
MemWrite
MemRead
16 32
Executing Branch Operations (IE)
• Branch operations involves
– compare the operands read from the
Register File during decode for equality
(zero ALU output)
– compute the branch target address by
adding the updated PC to the 16-bit
signed-extended offset field in the instr
Instruction
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
zero
ALU control
Sign
Extend
16 32
Shift
left 2
Add
4
Add
PC
Branch
target
address
(to branch
control logic)
Memory Access (MEM) Cycle
• If a load, the effective address computed from
the previous cycle is referenced and the
memory is read. The actual data transfer to
the register does not occur until the next
cycle.
• If a store, the data from the register is written
to the effective address in memory.
Write-Back (WB) Cycle
• Occurs with Register-Register ALU instructions
or load instructions.
• Simple operation whether the operation is a
register-register operation or a memory load
operation, the resulting data is written to the
appropriate register.
The single cycle datapath
Single Cycle Datapath with Control Unit
Read
Address
Instr[31-0]
Instruction
Memory
Add
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
ovf
zero
RegWrite
Data
Memory
Address
Write Data
Read Data
MemWrite
MemRead
Sign
Extend
16 32
MemtoReg
ALUSrc
Shift
left 2
Add
PCSrc
RegDst
ALU
control
1
1
1
0
0
0
0
1
ALUOp
Instr[5-0]
Instr[15-0]
Instr[25-21]
Instr[20-16]
Instr[15
-11]
Control
Unit
Instr[31-26]
Branch
Read
Address
Instr[31-0]
Instruction
Memory
Add
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
ovf
zero
RegWrite
Data
Memory
Address
Write Data
Read Data
MemWrite
MemRead
Sign
Extend
16 32
MemtoReg
ALUSrc
Shift
left 2
Add
PCSrc
RegDst
ALU
control
1
1
1
0
0
0
0
1
ALUOp
Instr[5-0]
Instr[5-0]
Instr[25-21]
Instr[20-16]
Instr[15
-11]
Control
Unit
Instr[31-26]
Branch
R-type Instruction Data/Control Flow
Read
Address Instr[31-0]
Instruction
Memory
Add
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
ovf
zero
RegWrite
Data
Memory
Address
Write Data
Read Data
MemWrite
MemRead
Sign
Extend
16 32
MemtoReg
ALUSrc
Shift
left 2
Add
PCSrc
RegDst
ALU
control
1
1
1
0
0 0
0
1
ALUOp
Instr[5-0]
Instr[15-0]
Instr[25-21]
Instr[20-16]
Instr[15
-11]
Control
Unit
Instr[31-26]
Branch
Load Word Instruction Data/Control Flow
Store Word
Instruction?
Read
Address
Instr[31-0]
Instruction
Memory
Add
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
ovf
zero
RegWrite
Data
Memory
Address
Write Data
Read Data
MemWrite
MemRead
Sign
Extend
16 32
MemtoReg
ALUSrc
Shift
left 2
Add
PCSrc
RegDst
ALU
control
1
1
1
0
0
0
0
1
ALUOp
Instr[5-0]
Instr[15-0]
Instr[25-21]
Instr[20-16]
Instr[15
-11]
Control
Unit
Instr[31-26]
Branch
Branch Instruction Data/Control Flow
Fetch : 2 ns
Decode/ Reg Read : 1 ns
Execute : 2 ns
Memory : 2 ns
WB : 1 ns
Single Cycle Multi Cycle Pipelined
Clock Cycle Time Longest Inst. Time
= 2+1+2+2+1 = 8
ns
Longest stage time
= 2 ns
Longest stage time
= 2 ns
Execution Time
(1000 instruction
50% ALU, 10%
Store, 30%
Branch , 10%
Load)
1000 x 8 = 8000 ns 500 x 4 x 2 +100 x
4 x 2 + 300 x 3 x2 +
100 x 5 x 2 = 7600
ns
5 x 2 + (1000 -1) x
2 =2008 ns
The Basic Pipeline For MIPS
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Reg
ALU
DMem
Ifetch Reg
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7
Cycle 5
I
n
s
t
r.
O
r
d
e
r
34
CPU Pipelining: Example
 Example : Single-Cycle, non-pipelined execution
 Total time for 3 instructions: 24 ns
Instruc
tion
fetch
Reg ALU
Data
access
Reg
8ns
Instruc
tion
fetch
Reg ALU
Data
access
Reg
8ns
Instruc
tion
fetch
8 ns
Time
lw $1, 100($0)
lw $2, 200($0)
lw $3, 300($0)
2 4 6 8 1 0 1 2 14 1 6 1 8
. . .
P rog ram
ex e cution
o rd er
(in instructions)
35
CPU Pipelining: Example
 Single-cycle, pipelined execution
 Improve performance by increasing instruction throughput
 Total time for 3 instructions = 14 ns
 Each instruction adds 2 ns to total execution time
 Stage time limited by slowest resource (2 ns)
 Assumptions:
 Write to register occurs in 1st half of clock
 Read from register occurs in 2nd half of clock
R eg
R eg
R eg
2 4 6 8 1 0 1 2 1 4
Instruction
fetch
R eg A L U
D ata
access
Time
lw$1, 100($0)
lw$2, 200($0)
lw$3, 300($0)
2 ns
Instruction
fetch
R eg A L U
D ata
access
2 ns
Instruction
fetch
R eg A L U
D ata
access
2 n s 2 n s 2 n s 2 ns 2 n s
P rog ram
ex e cutio n
o rd er
(in in stru ctio n s)
CPU pipelining: Example
• Time without pipelining = 24 ns
• Time with pipelining = 14 ns (not = 24/5), WHY???
– Number of instructions is not large
• Let’s increase the number of instructions
– If number of instructions = 1,000,000 instruction , the total
time with pipelining = 1,000,000 X 2 ns = 2,000,000 ns
– Time without pipelining = 1,000,000 X 8ns = 8,000,000 ns
– The speed up = 4 (increased)
The pipelined version of MIPS
Datapath
• Need registers between stages
–
IF
ID
EX for Load
MEM for Load
WB for Load
Wrong
register
number
There is a BUG here
Corrected Datapath for Load
The pipelined data path with control
signals
Control Signals

Más contenido relacionado

Similar a CMPN301 Pipelining Hazards

CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdfCS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdfAsst.prof M.Gokilavani
 
Parallel Processing Techniques Pipelining
Parallel Processing Techniques PipeliningParallel Processing Techniques Pipelining
Parallel Processing Techniques PipeliningRNShukla7
 
UNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsUNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsButtaRajasekhar2
 
MIPS IMPLEMENTATION.pptx
MIPS IMPLEMENTATION.pptxMIPS IMPLEMENTATION.pptx
MIPS IMPLEMENTATION.pptxJEEVANANTHAMG6
 
Microchip's PIC Micro Controller
Microchip's PIC Micro ControllerMicrochip's PIC Micro Controller
Microchip's PIC Micro ControllerMidhu S V Unnithan
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementationkavitha2009
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementationkavitha2009
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAAiman Hud
 
Pipelining of Processors
Pipelining of ProcessorsPipelining of Processors
Pipelining of ProcessorsGaditek
 
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- PerformanceLec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- PerformanceHsien-Hsin Sean Lee, Ph.D.
 
Basic computer organization and design
Basic computer organization and designBasic computer organization and design
Basic computer organization and designmahesh kumar prajapat
 
Control unit design
Control unit designControl unit design
Control unit designDhaval Bagal
 
Design pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesDesign pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesMahmudul Hasan
 

Similar a CMPN301 Pipelining Hazards (20)

CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdfCS304PC:Computer Organization and Architecture  Session 33 demo 1 ppt.pdf
CS304PC:Computer Organization and Architecture Session 33 demo 1 ppt.pdf
 
Parallel Processing Techniques Pipelining
Parallel Processing Techniques PipeliningParallel Processing Techniques Pipelining
Parallel Processing Techniques Pipelining
 
UNIT 3 - General Purpose Processors
UNIT 3 - General Purpose ProcessorsUNIT 3 - General Purpose Processors
UNIT 3 - General Purpose Processors
 
MIPS IMPLEMENTATION.pptx
MIPS IMPLEMENTATION.pptxMIPS IMPLEMENTATION.pptx
MIPS IMPLEMENTATION.pptx
 
Microchip's PIC Micro Controller
Microchip's PIC Micro ControllerMicrochip's PIC Micro Controller
Microchip's PIC Micro Controller
 
BTCS501_MM_Ch9.pptx
BTCS501_MM_Ch9.pptxBTCS501_MM_Ch9.pptx
BTCS501_MM_Ch9.pptx
 
Unit iii
Unit iiiUnit iii
Unit iii
 
Unit - 5 Pipelining.pptx
Unit - 5 Pipelining.pptxUnit - 5 Pipelining.pptx
Unit - 5 Pipelining.pptx
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
 
Unit 4 COA.pptx
Unit 4 COA.pptxUnit 4 COA.pptx
Unit 4 COA.pptx
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIA
 
Parallel processing and pipelining
Parallel processing and pipeliningParallel processing and pipelining
Parallel processing and pipelining
 
Pipelining of Processors
Pipelining of ProcessorsPipelining of Processors
Pipelining of Processors
 
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- PerformanceLec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
Lec3 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Performance
 
Basic computer organization and design
Basic computer organization and designBasic computer organization and design
Basic computer organization and design
 
CO Module 5
CO Module 5CO Module 5
CO Module 5
 
CA UNIT III.pptx
CA UNIT III.pptxCA UNIT III.pptx
CA UNIT III.pptx
 
Control unit design
Control unit designControl unit design
Control unit design
 
Design pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesDesign pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelines
 

Último

Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier Fernández Muñoz
 
Structural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot MuiliStructural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot MuiliNimot Muili
 
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...shreenathji26
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfalene1
 
priority interrupt computer organization
priority interrupt computer organizationpriority interrupt computer organization
priority interrupt computer organizationchnrketan
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptxmohitesoham12
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosVictor Morales
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsResearcher Researcher
 
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...arifengg7
 
ADM100 Running Book for sap basis domain study
ADM100 Running Book for sap basis domain studyADM100 Running Book for sap basis domain study
ADM100 Running Book for sap basis domain studydhruvamdhruvil123
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewsandhya757531
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProRay Yuan Liu
 
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...KrishnaveniKrishnara1
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionSneha Padhiar
 
Module-1-Building Acoustics(Introduction)(Unit-1).pdf
Module-1-Building Acoustics(Introduction)(Unit-1).pdfModule-1-Building Acoustics(Introduction)(Unit-1).pdf
Module-1-Building Acoustics(Introduction)(Unit-1).pdfManish Kumar
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfManish Kumar
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 

Último (20)

Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptx
 
Versatile Engineering Construction Firms
Versatile Engineering Construction FirmsVersatile Engineering Construction Firms
Versatile Engineering Construction Firms
 
Structural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot MuiliStructural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
 
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
 
priority interrupt computer organization
priority interrupt computer organizationpriority interrupt computer organization
priority interrupt computer organization
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitos
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending Actuators
 
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
 
ADM100 Running Book for sap basis domain study
ADM100 Running Book for sap basis domain studyADM100 Running Book for sap basis domain study
ADM100 Running Book for sap basis domain study
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision Pro
 
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based question
 
Module-1-Building Acoustics(Introduction)(Unit-1).pdf
Module-1-Building Acoustics(Introduction)(Unit-1).pdfModule-1-Building Acoustics(Introduction)(Unit-1).pdf
Module-1-Building Acoustics(Introduction)(Unit-1).pdf
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 

CMPN301 Pipelining Hazards

  • 1. CMPN301: Computer Architecture Pipelining Mayada Hadhoud Computer Engineering Department Cairo University
  • 2. Agenda • What is pipelining? • Characteristics of pipelining • Pipelining Hazards – Structural Hazard – Data Hazard – Control Hazard
  • 3. ENGR9861 Winter 2007 RV What Is A Pipeline? • Pipelining is used by virtually all modern microprocessors to enhance performance by overlapping the execution of instructions.
  • 4. 4 What Is Pipelining • Laundry Example • 4 persons each have one load of clothes to wash, dry, and fold • Washer takes 30 minutes • Dryer takes 40 minutes • “Folder” takes 20 minutes A B C D
  • 5. 5 What Is Pipelining Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take? A B C D 30 40 20 30 40 20 30 40 20 30 40 20 6 PM 7 8 9 10 11 Midnight T a s k O r d e r Time
  • 6. Appendix A - Pipelining 6 What Is Pipelining Start work ASAP • Pipelined laundry takes 3.5 hours for 4 loads A B C D 6 PM 7 8 9 10 11 Midnight T a s k O r d e r Time 30 40 40 40 40 20
  • 7. Appendix A - Pipelining 7 Pipelining Lessons • Pipelining doesn’t help latency of single task, it helps throughput of entire workload • Pipeline rate limited by slowest pipeline stage • Multiple tasks operating simultaneously • Potential speedup = Number pipe stages • Unbalanced lengths of pipe stages reduces speedup • Time to “fill” pipeline and time to “drain A B C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20 What Is Pipelining
  • 8. Pipelining Theoretical Performance • An ideal pipeline divides a task into k independent sequential subtasks – Each subtask requires 1 time unit to complete – The task itself requires k time units to complete • For n iterations of task, the execution times: – With no pipelining: nk time units – With pipelining: k + (n-1) time units • Speedup of a k-stage pipeline is – S = nk/[k+(n-1)] → = k for large n
  • 9.
  • 10. Characteristics Of Pipelining • The previous expression is ideal. • In terms of a CPU, the implementation of pipelining has the effect of reducing the average instruction time, therefore reducing the average CPI. • EX: If each instruction in a microprocessor takes 5 clock cycles (unpipelined) and we have a 4 stage pipeline, the ideal average CPI with the pipeline will be 1.25 .
  • 11. RISC Instruction Set Basics (MIPS) • Properties of RISC architectures: – All operations on data apply to data in registers and typically change the entire register (32-bits or 64-bits). – The only operations that affect memory are load/store operations. Memory to register and register to memory. – Usually, instructions are few and are typically one size.
  • 12. • ALU Instructions (R-type): • Arithmetic operations, take two registers as operands. The result is stored in a third register. • Logical operations AND OR, XOR, shift RISC Instruction Set Basics (MIPS) Types of Instructions
  • 14. Immediate Format Instructions (I-type): • Usually take a register (base register) as an operand and a 16-bit immediate value. The sum of the two will create the effective address. A second register acts as a source in the case of a load operation. • In the case of a store operation the second register contains the data to be stored. RISC Instruction Set Basics (MIPS) Types of Instructions
  • 16. Jump Format (J-type) • Conditional branches are transfers of control. As described before, a branch causes an immediate value to be added to the current program counter. RISC Instruction Set Basics (MIPS) Types of Instructions
  • 17.
  • 18. RISC Instruction Set Implementation • We first need to look at how instructions in the MIPS instruction set are implemented without pipelining. We’ll assume that any instruction of the subset of MIPS can be executed in at most 5 clock cycles. • The five clock cycles will be broken up into the following steps: • Instruction Fetch Cycle • Instruction Decode/Register Fetch Cycle • Execution Cycle • Memory Access Cycle • Write- Back
  • 19. Fetching Instructions (IF) • Fetching instructions involves – reading the instruction from the Instruction Memory – updating the PC to hold the address of the next instruction – PC is updated every cycle, so it does not need an explicit write control signal – Instruction Memory is read every cycle, so it doesn’t need an explicit read control signal Read Address Instruction Instruction Memory Add PC 4
  • 20. Decoding Instructions (ID) • Decoding instructions involves – sending the fetched instruction’s opcode and function field bits to the control unit – reading two values from the Register File • Register File addresses are contained in the instruction Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 Control Unit
  • 21. Executing R Format Operations (IE) • R format operations (add,sub,slt,and,or) – perform the (op and funct) operation on values in rs and rt – store the result back into the Register File (into location rd) – The Register File is not written every cycle (e.g. sw), so we need an explicit write control signal for the Register File R-type: 31 25 20 15 5 0 op rs rt rd funct shamt 10 Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU overflow zero ALU control RegWrite
  • 22. Executing Load and Store Operations (IE) • Load and store operations involve – compute memory address by adding the base register (read from the Register File during decode) to the 16-bit signed-extended offset field in the instruction – store value (read from the Register File during decode) written to the Data Memory – load value, read from the Data Memory, written to the Register File Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU overflow zero ALU control RegWrite Data Memory Address Write Data Read Data Sign Extend MemWrite MemRead 16 32
  • 23. Executing Branch Operations (IE) • Branch operations involves – compare the operands read from the Register File during decode for equality (zero ALU output) – compute the branch target address by adding the updated PC to the 16-bit signed-extended offset field in the instr Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU zero ALU control Sign Extend 16 32 Shift left 2 Add 4 Add PC Branch target address (to branch control logic)
  • 24. Memory Access (MEM) Cycle • If a load, the effective address computed from the previous cycle is referenced and the memory is read. The actual data transfer to the register does not occur until the next cycle. • If a store, the data from the register is written to the effective address in memory.
  • 25. Write-Back (WB) Cycle • Occurs with Register-Register ALU instructions or load instructions. • Simple operation whether the operation is a register-register operation or a memory load operation, the resulting data is written to the appropriate register.
  • 26. The single cycle datapath
  • 27. Single Cycle Datapath with Control Unit Read Address Instr[31-0] Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU ovf zero RegWrite Data Memory Address Write Data Read Data MemWrite MemRead Sign Extend 16 32 MemtoReg ALUSrc Shift left 2 Add PCSrc RegDst ALU control 1 1 1 0 0 0 0 1 ALUOp Instr[5-0] Instr[15-0] Instr[25-21] Instr[20-16] Instr[15 -11] Control Unit Instr[31-26] Branch
  • 28. Read Address Instr[31-0] Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU ovf zero RegWrite Data Memory Address Write Data Read Data MemWrite MemRead Sign Extend 16 32 MemtoReg ALUSrc Shift left 2 Add PCSrc RegDst ALU control 1 1 1 0 0 0 0 1 ALUOp Instr[5-0] Instr[5-0] Instr[25-21] Instr[20-16] Instr[15 -11] Control Unit Instr[31-26] Branch R-type Instruction Data/Control Flow
  • 29. Read Address Instr[31-0] Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU ovf zero RegWrite Data Memory Address Write Data Read Data MemWrite MemRead Sign Extend 16 32 MemtoReg ALUSrc Shift left 2 Add PCSrc RegDst ALU control 1 1 1 0 0 0 0 1 ALUOp Instr[5-0] Instr[15-0] Instr[25-21] Instr[20-16] Instr[15 -11] Control Unit Instr[31-26] Branch Load Word Instruction Data/Control Flow Store Word Instruction?
  • 30. Read Address Instr[31-0] Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU ovf zero RegWrite Data Memory Address Write Data Read Data MemWrite MemRead Sign Extend 16 32 MemtoReg ALUSrc Shift left 2 Add PCSrc RegDst ALU control 1 1 1 0 0 0 0 1 ALUOp Instr[5-0] Instr[15-0] Instr[25-21] Instr[20-16] Instr[15 -11] Control Unit Instr[31-26] Branch Branch Instruction Data/Control Flow
  • 31. Fetch : 2 ns Decode/ Reg Read : 1 ns Execute : 2 ns Memory : 2 ns WB : 1 ns Single Cycle Multi Cycle Pipelined Clock Cycle Time Longest Inst. Time = 2+1+2+2+1 = 8 ns Longest stage time = 2 ns Longest stage time = 2 ns Execution Time (1000 instruction 50% ALU, 10% Store, 30% Branch , 10% Load) 1000 x 8 = 8000 ns 500 x 4 x 2 +100 x 4 x 2 + 300 x 3 x2 + 100 x 5 x 2 = 7600 ns 5 x 2 + (1000 -1) x 2 =2008 ns
  • 32. The Basic Pipeline For MIPS Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Reg ALU DMem Ifetch Reg Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7 Cycle 5 I n s t r. O r d e r
  • 33. 34 CPU Pipelining: Example  Example : Single-Cycle, non-pipelined execution  Total time for 3 instructions: 24 ns Instruc tion fetch Reg ALU Data access Reg 8ns Instruc tion fetch Reg ALU Data access Reg 8ns Instruc tion fetch 8 ns Time lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) 2 4 6 8 1 0 1 2 14 1 6 1 8 . . . P rog ram ex e cution o rd er (in instructions)
  • 34. 35 CPU Pipelining: Example  Single-cycle, pipelined execution  Improve performance by increasing instruction throughput  Total time for 3 instructions = 14 ns  Each instruction adds 2 ns to total execution time  Stage time limited by slowest resource (2 ns)  Assumptions:  Write to register occurs in 1st half of clock  Read from register occurs in 2nd half of clock R eg R eg R eg 2 4 6 8 1 0 1 2 1 4 Instruction fetch R eg A L U D ata access Time lw$1, 100($0) lw$2, 200($0) lw$3, 300($0) 2 ns Instruction fetch R eg A L U D ata access 2 ns Instruction fetch R eg A L U D ata access 2 n s 2 n s 2 n s 2 ns 2 n s P rog ram ex e cutio n o rd er (in in stru ctio n s)
  • 35. CPU pipelining: Example • Time without pipelining = 24 ns • Time with pipelining = 14 ns (not = 24/5), WHY??? – Number of instructions is not large • Let’s increase the number of instructions – If number of instructions = 1,000,000 instruction , the total time with pipelining = 1,000,000 X 2 ns = 2,000,000 ns – Time without pipelining = 1,000,000 X 8ns = 8,000,000 ns – The speed up = 4 (increased)
  • 36. The pipelined version of MIPS Datapath • Need registers between stages –
  • 37. IF
  • 38. ID
  • 43. The pipelined data path with control signals