PROCESSOR AND CONTROL UNIT

Velammal Engineering College
Department of Computer Science
and Engineering
Welcome…
Slide Sources: Patterson & Hennessy COD book
website (copyright Morgan Kaufmann) adapted
and supplemented
Mr. A. Arockia Abins &
Ms. R. Amirthavalli,
Asst. Prof,
CSE,
Velammal Engineering College

UNIT III
PROCESSOR AND
CONTROL UNIT
Topics:
 Basic MIPS implementation – Building datapath –
Control Implementation scheme – Pipelining –
Pipelined datapath and control – Handling Data
hazards & Control hazards – Exceptions.
2

Implementing MIPS
• We're ready to look at an implementation of the MIPS instruction set
• Simplified to contain only
o arithmetic-logic instructions: add, sub, and, or, slt
o memory-reference instructions: lw, sw
o control-flow instructions: beq, j
op rs rt offset
6 bits 5 bits 5 bits 16 bits
op rs rt rd funct
shamt
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
R-Format
I-Format
op address
6 bits 26 bits
J-Format
3

Implementing MIPS: the
Fetch/Execute Cycle
• High-level abstract view of fetch/execute implementation
o use the program counter (PC) to read instruction address
o fetch the instruction from memory and increment PC
o use fields of the instruction to select registers to read
o execute depending on the instruction
o repeat…
Registers
Register #
Data
Register #
Data
memory
Address
Data
Register #
PC Instruction ALU
Instruction
memory
Address
4

The basic implementation of the MIPS subset, including the necessary
multiplexors and control lines
5

Overview: Processor
Implementation Styles
• Single Cycle
o perform each instruction in 1 clock cycle
o clock cycle must be long enough for slowest instruction; therefore,
o disadvantage: only as fast as slowest instruction
• Multi-Cycle
o break fetch/execute cycle into multiple steps
o perform 1 step in each clock cycle
o advantage: each instruction uses only as many cycles as it needs
• Pipelined
o execute each instruction in multiple steps
o perform 1 step / instruction in each clock cycle
o process multiple instructions in parallel – assembly line
6

2.Building datapath
 datapath element
 A unit used to operate on or hold data within a processor.
 MIPS – datapath elements:
1. Program Counter(PC)
2. Instruction Memory
3. Register File
4. ALU
5. Data Memory
7

Datapath: Instruction
Store/Fetch & PC Increment
PC
Instruction
memory
Instruction
address
Instruction
a. Instruction memory b. Program counter
Add Sum
c. Adder
PC
Instruction
memory
Read
address
Instruction
4
Add
Three elements used to store
and fetch instructions and
increment the PC
Datapath
8

Animating the Datapath
Instruction <- MEM[PC]
PC <- PC + 4
RD
Memory
ADDR
PC
Instruction
4
ADD
9

Datapath: R-Type
Instruction
ALU control
RegWrite
Registers
Write
register
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Write
data
ALU
result
ALU
Data
Data
Register
numbers
a. Registers b. ALU
Zero
5
5
5 3
Instruction
Registers
Write
register
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Write
data
ALU
result
ALU
Zero
RegWrite
ALU operation
3
Two elements used to implement
R-type instructions
Datapath
op rs rt rd funct
shamt
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
R-Format
4 4
10

add rd, rs, rt
R[rd] <- R[rs] + R[rt];
5 5 5
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
op rs rt rd funct
shamt
Operation
ALU Zero
Instruction
3
11

Datapath:
Load/Store Instruction
16 32
Sign
extend
b. Sign-extension unit
MemRead
MemWrite
Data
memory
Write
data
Read
data
a. Data memory unit
Address
Instruction
16 32
Registers
Write
register
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Data
memory
Write
data
Read
data
Write
data
Sign
extend
ALU
result
Zero
ALU
Address
MemRead
MemWrite
RegWrite
ALU operation
3
Two additional elements used
To implement load/stores
Datapath
op rs rt offset
6 bits 5 bits 5 bits 16 bits
I-Format
12

lw rt, offset(rs)
R[rt] <- MEM[R[rs] + s_extend(offset)];
13

sw rt, offset(rs)
MEM[R[rs] + sign_extend(offset)] <- R[rt]
14

Datapath: Branch
Instruction
16 32
Sign
extend
Zero
ALU
Sum
Shift
left 2
To branch
control logic
Branch target
PC + 4 from instruction datapath
Instruction
Add
Registers
Write
register
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Write
data
RegWrite
ALU operation
3
Datapath
No shift hardware required:
simply connect wires from
input to output, each shifted
left 2 bits
15

beq rs, rt, offset
if (R[rs] == R[rt]) then
PC <- PC+4 + s_extend(offset<<2)
16

MIPS Datapath
PC
Instruction
memory
Read
address
Instruction
16 32
Add ALU
result
M
u
x
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Shift
left 2
4
M
u
x
ALU operation
3
RegWrite
MemRead
MemWrite
PCSrc
ALUSrc
MemtoReg
ALU
result
Zero
ALU
Data
memory
Address
Write
data
Read
data M
u
x
Sign
extend
Add
Adding branch capability and another multiplexor
Instruction address is either
PC+4 or branch target address
Extra adder needed as both
adders operate in each cycle
New multiplexor
17

Datapath Executing add
add rd, rs, rt
5 5
16
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
E
X
T
N
D
16 32
Zero
RD
WD
MemRead
Data
Memory
ADDR
MemWrite
5
Instruction
32
M
U
X
ALUSrc
MemtoReg
ADD
<<2
RD
Instruction
Memory
ADDR
PC
4
ADD
ADD
M
U
X
M
U
X
PCSrc
18

Datapath Executing lw
lw rt,offset(rs)
5 5
16
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
E
X
T
N
D
16 32
Zero
RD
WD
MemRead
Data
Memory
ADDR
MemWrite
5
Instruction
32
M
U
X
ALUSrc
MemtoReg
ADD
<<2
RD
Instruction
Memory
ADDR
PC
4
ADD
ADD
M
U
X
M
U
X
PCSrc
19

Datapath Executing sw
sw rt,offset(rs)
5 5
16
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
E
X
T
N
D
16 32
Zero
RD
WD
MemRead
Data
Memory
ADDR
MemWrite
5
Instruction
32
M
U
X
ALUSrc
MemtoReg
ADD
<<2
RD
Instruction
Memory
ADDR
PC
4
ADD
ADD
M
U
X
M
U
X
PCSrc
20

Datapath Executing beq
beq r1,r2,offset
5 5
16
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
E
X
T
N
D
16 32
Zero
RD
WD
MemRead
Data
Memory
ADDR
MemWrite
5
Instruction
32
M
U
X
ALUSrc
MemtoReg
ADD
<<2
RD
Instruction
Memory
ADDR
PC
4
ADD
ADD
M
U
X
M
U
X
PCSrc
21

Control Implementation
Scheme
• Control unit takes input from
o the instruction opcode bits
• Control unit generates
o ALU control input
o write enable (possibly, read enable also) signals for each storage
element
o selector controls for each multiplexor
22

ALU Control
• Plan to control ALU: main control sends a 2-bit ALUOp control field to the ALU
control. Based on ALUOp and funct field of instruction the ALU control generates
the 4-bit ALU control field
o ALU control Func-
field tion
0000 and
0001 or
0010 add
0110 sub
0111 slt
• ALU must perform
o add for load/stores (ALUOp 00)
o sub for branches (ALUOp 01)
o one of and, or, add, sub, slt for R-type instructions, depending on the instruction’s 6-bit
funct field (ALUOp 10)
Main
Control
ALU
Control
2
ALUOp
6
Instruction
funct field
4
ALU
control
input
To
ALU
23

Setting ALU Control Bits
Instruction AluOp Instruction Funct Field Desired ALU control
opcode operation ALU action input
LW 00 load word xxxxxx add 0010
SW 00 store word xxxxxx add 0010
Branch eq 01 branch eq xxxxxx subtract 0110
R-type 10 add 100000 add 0010
R-type 10 subtract 100010 subtract 0110
R-type 10 AND 100100 and 0000
R-type 10 OR 100101 or 0001
R-type 10 set on less 101010 set on less 0111
Truth table for ALU control bits
ALUOp Funct field Operation
ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0 0 X X X X X X 0010
X 1 X X X X X X 0110
1 X X X 0 0 0 0 0010
1 X X X 0 0 1 0 0110
1 X X X 0 1 0 0 0000
1 X X X 0 1 0 1 0001
1 X X X 1 0 1 0 0111 24

Designing the Main
Control
• Observations about MIPS instruction format
o opcode is always in bits 31-26
o two registers to be read are always rs (bits 25-21) and rt (bits 20-
16)
o base register for load/stores is always rs (bits 25-21)
o 16-bit offset for branch equal and load/store is always bits 15-0
o destination register for loads is in bits 20-16 (rt) while for R-type
instructions it is in bits 15-11 (rd) (will require multiplexor to select)
31-26 25-21 20-16 15-11 10-6 5-0
31-26 25-21 20-16 15-0
opcode
opcode
rs
rs
rt
rt address
rd shamt funct
R-type
Load/store
or branch
25

The Main Control Unit
• Control signals derived from instruction
0 rs rt rd shamt funct
31:26 5:0
25:21 20:16 15:11 10:6
35 or 43 rs rt address
31:26 25:21 20:16 15:0
4 rs rt address
31:26 25:21 20:16 15:0
R-type
Load/
Store
Branch
opcode always
read
read,
except
for load
write for
R-type
and load
sign-extend
and add
26

Datapath with Control I
New multiplexor
27

Control Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number for the The register destination number for the
Write register comes from the rt field (bits 20-16) Write register comes from the rd field (bits 15-11)
RegWrite None The register on the Write register input is written
with the value on the Write data input
AlLUSrc The second ALU operand comes from the The second ALU operand is the sign-extended,
second register file output (Read data 2) lower 16 bits of the instruction
PCSrc The PC is replaced by the output of the adder The PC is replaced by the output of the adder
that computes the value of PC + 4 that computes the branch target
MemRead None Data memory contents designated by the address
input are put on the first Read data output
MemWrite None Data memory contents designated by the address
input are replaced by the value of the Write data input
MemtoReg The value fed to the register Write data input The value fed to the register Write data input
comes from the ALU comes from the data memory
Effects of the seven control signals
28

Datapath with Control II
PC
Instruction
memory
Read
address
Instruction
[31– 0]
Instruction [20 16]
Instruction [25 21]
Add
Instruction [5 0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
Branch
RegDst
ALUSrc
Instruction [31 26]
4
16 32
Instruction [15 0]
0
0
M
u
x
0
1
Control
Add
ALU
result
M
u
x
0
1
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
M
u
x
1
ALU
result
Zero
PCSrc
Data
memory
Write
data
Read
data
M
u
x
1
Instruction [15 11]
ALU
control
Shift
left 2
ALU
Address
MIPS datapath with the control unit: input to control is the 6-bit instruction
opcode field, output is seven 1-bit signals and the 2-bit ALUOp signal
29

Datapath with Control II (cont.)
Instruction RegDst ALUSrc
Memto-
Reg
Reg
Write
Mem
Read
Mem
Write Branch ALUOp1 ALUp0
R-format 1 0 0 1 0 0 0 1 0
lw 0 1 1 1 1 0 0 0 0
sw X 1 X 0 0 1 0 0 0
beq X 0 X 0 0 0 1 0 1
PC
Instruction
memory
Read
address
Instruction
[31– 0]
Instruction [20 16]
Instruction [25 21]
Add
Instruction [5 0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
Branch
RegDst
ALUSrc
Instruction [31 26]
4
16 32
Instruction [15 0]
0
0
M
u
x
0
1
Control
Add
ALU
result
M
u
x
0
1
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
M
u
x
1
ALU
result
Zero
PCSrc
Data
memory
Write
data
Read
data
M
u
x
1
Instruction [15 11]
ALU
control
Shift
left 2
ALU
Address
Determining control signals for the MIPS datapath based on instruction opcode
PCSrc cannot be
set directly from the
opcode: zero test
outcome is required
30

Control Signals:
R-Type Instruction
Control signals
shown in blue
1
0
0
0
1
???
Value depends on
funct
0
0
5 5
16
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
E
X
T
N
D
16 32
Zero
RD
WD
MemRead
Data
Memory
ADDR
MemWrite
5
Instruction
I
32
M
U
X
ALUSrc
MemtoReg
ADD
<<2
RD
Instruction
Memory
ADDR
PC
4
ADD
ADD
M
U
X
M
U
X
PCSrc
MUX
RegDst
5
rd
I[15:11]
rt
I[20:16]
rs
I[25:21]
immediate/
offset
I[15:0]
0
1
0
1
1
0
1
0
4
31

5 5
16
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
E
X
T
N
D
16 32
Zero
RD
WD
MemRead
Data
Memory
ADDR
MemWrite
5
Instruction
I
32
M
U
X
ALUSrc
MemtoReg
ADD
<<2
RD
Instruction
Memory
ADDR
PC
4
ADD
ADD
M
U
X
M
U
X
PCSrc
MUX
RegDst
5
rd
I[15:11]
rt
I[20:16]
rs
I[25:21]
immediate/
offset
I[15:0]
0
1
0
1
1
0
1
0
Control Signals:
lw Instruction
0
Control signals
shown in blue
0
0010
1
1
1
0
1
32

5 5
16
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
E
X
T
N
D
16 32
Zero
RD
WD
MemRead
Data
Memory
ADDR
MemWrite
5
Instruction
I
32
M
U
X
ALUSrc
MemtoReg
ADD
<<2
RD
Instruction
Memory
ADDR
PC
4
ADD
ADD
M
U
X
M
U
X
PCSrc
MUX
RegDst
5
rd
I[15:11]
rt
I[20:16]
rs
I[25:21]
immediate/
offset
I[15:0]
0
1
0
1
1
0
1
0
Control Signals:
sw Instruction
0
Control signals
shown in blue
X
0010
1
X
0
1
0
33

5 5
16
RD1
RD2
RN1 RN2 WN
WD
RegWrite
Register File
Operation
ALU
3
E
X
T
N
D
16 32
Zero
RD
WD
MemRead
Data
Memory
ADDR
MemWrite
5
Instruction
I
32
M
U
X
ALUSrc
MemtoReg
ADD
<<2
RD
Instruction
Memory
ADDR
PC
4
ADD
ADD
M
U
X
M
U
X
PCSrc
MUX
RegDst
5
rd
I[15:11]
rt
I[20:16]
rs
I[25:21]
immediate/
offset
I[15:0]
0
1
0
1
1
0
1
0
Control Signals:
beq Instruction
Control signals
shown in blue
X
0110
0
X
0
0
0
1 if Zero=1
34

Datapath with Control III
Shift
left 2
PC
Instruction
memory
Read
address
Instruction
[31– 0]
Data
memory
Read
data
Write
data
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Instruction [15– 11]
Add
ALU
result
Zero
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
Branch
Jump
RegDst
ALUSrc
4
M
u
x
Instruction [25– 0] Jump address [31– 0]
PC+4 [31– 28]
Sign
extend
16 32
1
M
u
x
1
0
M
u
x
0
1
M
u
x
0
1
ALU
control
Control
Add ALU
result
M
u
x
0
1 0
ALU
Shift
left 2
26 28
Address
31-26 25-0
opcode address
Jump
MIPS datapath extended to jumps: control unit generates new Jump control bit
New multiplexor with additional
control bit Jump
Composing jump
target address
35

Datapath with Control III
Shift
left 2
PC
Instruction
memory
Read
address
Instruction
[31– 0]
Data
memory
Read
data
Write
data
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Add
ALU
result
Zero
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
Branch
Jump
RegDst
ALUSrc
4
M
u
x
Instruction [25– 0] Jump address [31– 0]
PC+4 [31– 28]
Sign
extend
16 32
1
M
u
x
1
0
M
u
x
0
1
M
u
x
0
1
ALU
control
Control
Add ALU
result
M
u
x
0
1 0
ALU
Shift
left 2
26 28
Address
31-26 25-0
opcode address
Jump
MIPS datapath extended to jumps: control unit generates new Jump control bit
New multiplexor with additional
control bit Jump
Composing jump
target address
37

Enhancing Performance with
Pipelining
 An implementation technique in which
multiple instructions are overlapped in execution.
38

Pipelining
• Start work ASAP!! Do not waste time!
Time
7
6 PM 8 9 10 11 12 1 2 AM
A
B
C
D
Time
7
6 PM 8 9 10 11 12 1 2 AM
A
B
C
D
Task
order
Task
order
Assume 30 min. each task – wash, dry, fold, store – and that
separate tasks use separate hardware and so can be overlapped
Pipelined
Not pipelined
39

Breaking instructions
into steps
• We break instructions into the following potential execution steps –
not all instructions require all the steps – each step takes one
clock cycle
1. Instruction fetch and PC increment (IF)
2. Instruction decode and register fetch (ID)
3. Execution, memory address computation, or branch completion (EX)
4. Memory access or R-type instruction completion (MEM)
5. Memory read completion (WB)
• Each MIPS instruction takes from 3 – 5 cycles (steps)
40

Step 1: Instruction Fetch
& PC Increment (IF)
• Use PC to get instruction and put it in the instruction register.
Increment the PC by 4 and put the result back in the PC.
• Can be described succinctly using RTL (Register-Transfer Language):
IR = Memory[PC];
PC = PC + 4;
41

Step 2: Instruction Decode
and Register Fetch (ID)
• Read registers rs and rt in case we need them.
Compute the branch address in case the instruction is a branch.
• RTL:
A = Reg[IR[25-21]];
B = Reg[IR[20-16]];
ALUOut = PC + (sign-extend(IR[15-0]) << 2);
42

Step 3: Execution, Address
Computation or Branch
Completion (EX)
• ALU performs one of four functions depending on instruction
type
o memory reference:
ALUOut = A + sign-extend(IR[15-0]);
o R-type:
ALUOut = A op B;
o branch (instruction completes):
if (A==B) PC = ALUOut;
o jump (instruction completes):
PC = PC[31-28] || (IR(25-0) << 2)
43

Step 4: Memory access or
R-type Instruction
Completion
(MEM)
• Again depending on instruction type:
• Loads and stores access memory
o load
MDR = Memory[ALUOut];
o store (instruction completes)
Memory[ALUOut] = B;
• R-type (instructions completes)
Reg[IR[15-11]] = ALUOut;
44

Step 5: Memory Read
Completion (WB)
• Again depending on instruction type:
• Load writes back (instruction completes)
Reg[IR[20-16]]= MDR;
Important: There is no reason from a datapath (or control) point of
view that Step 5 cannot be eliminated by performing
Reg[IR[20-16]]= Memory[ALUOut];
for loads in Step 4. This would eliminate the MDR as well.
The reason this is not done is that, to keep steps balanced in
length, the design restriction is to allow each step to contain at
most one ALU operation, or one register access, or one
memory access.
45

Summary of Instruction
Execution
Step name
Action for R-type
instructions
Action for memory-reference
instructions
Action for
branches
Action for
jumps
Instruction fetch IR = Memory[PC]
PC = PC + 4
Instruction A = Reg [IR[25-21]]
decode/register fetch B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II
computation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)
jump completion
Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]
completion ALUOut or
Store: Memory [ALUOut] = B
Memory read completion Load: Reg[IR[20-16]] = MDR
1: IF
2: ID
3: EX
4: MEM
5: WB
Step
46

Pipelined vs. Single-Cycle
Instruction Execution: the
Plan
Instruction
fetch
Reg ALU
Data
access
Reg
8 ns
Instruction
fetch
Reg ALU
Data
access
Reg
8 ns
Instruction
fetch
8 ns
Time
lw $1, 100($0)
lw $2, 200($0)
lw $3, 300($0)
2 4 6 8 10 12 14 16 18
2 4 6 8 10 12 14
...
Program
execution
order
(in instructions)
Instruction
fetch
Reg ALU
Data
access
Reg
Time
lw $1, 100($0)
lw $2, 200($0)
lw $3, 300($0)
2 ns
Instruction
fetch
Reg ALU
Data
access
Reg
2 ns
Instruction
fetch
Reg ALU
Data
access
Reg
2 ns 2 ns 2 ns 2 ns 2 ns
Program
execution
order
(in instructions)
Single-cycle
Pipelined
Assume 2 ns for memory access, ALU operation; 1 ns for register access:
therefore, single cycle clock 8 ns; pipelined clock cycle 2 ns.
47

Pipelining: Keep in
Mind
• Pipelining does not reduce latency of a single task, it
increases throughput of entire workload
• Pipeline rate limited by longest stage
o potential speedup = number pipe stages
o unbalanced lengths of pipe stages reduces speedup
• Time to fill pipeline and time to drain it – when there is
slack in the pipeline – reduces speedup
48

Pipelining MIPS
• What makes it easy with MIPS?
o all instructions are same length
• so fetch and decode stages are similar for all instructions
o just a few instruction formats
• simplifies instruction decode and makes it possible in one stage
o memory operands appear only in load/stores
• so memory access can be deferred to exactly one later stage
o operands are aligned in memory
• one data transfer instruction requires one memory access stage
49

Pipelining MIPS
• What makes it hard?
o structural hazards: different instructions, at different stages,
in the pipeline want to use the same hardware resource
o control hazards: succeeding instruction, to put into pipeline,
depends on the outcome of a previous branch instruction,
already in pipeline
o data hazards: an instruction in the pipeline requires data to
be computed by a previous instruction still in the pipeline
• Before actually building the pipelined datapath and control
we first briefly examine these potential hazards
individually…
50

Structural Hazards
• Structural hazard: inadequate hardware to simultaneously
support all instructions in the pipeline in the same clock cycle
• E.g., suppose single – not separate – instruction and data
memory in pipeline below with one read port
o then a structural hazard between first and fourth lw instructions
• MIPS was designed to be pipelined: structural hazards are easy
to avoid!
2 4 6 8 10 12 14
Instruction
fetch
Reg ALU
Data
access
Reg
Time
lw $1, 100($0)
lw $2, 200($0)
lw $3, 300($0)
2 ns
Instruction
fetch
Reg ALU
Data
access
Reg
2 ns
Instruction
fetch
Reg ALU
Data
access
Reg
2 ns 2 ns 2 ns 2 ns 2 ns
Program
execution
order
(in instructions)
Pipelined
Instruction
fetch
Reg ALU
Data
access
Reg
2 ns
lw $4, 400($0)
Hazard if single memory
51

Control Hazards
• Control hazard: need to make a decision based on the result of a
previous instruction still executing in pipeline
• Solution 1 Stall the pipeline
Instruction
fetch
Reg ALU
Data
access
Reg
Time
beq $1, $2, 40
add $4, $5, $6
lw $3, 300($0)
4 ns
Instruction
fetch
Reg ALU
Data
access
Reg
2ns
Instruction
fetch
Reg ALU
Data
access
Reg
2ns
2 4 6 8 10 12 14 16
Program
execution
order
(in instructions)
Pipeline stall
bubble
Note that branch outcome is
computed in ID stage with
added hardware (later…)
52

Control Hazards
• Solution 2 Predict branch outcome
o e.g., predict branch-not-taken :
Instruction
fetch
Reg ALU
Data
access
Reg
Time
beq $1, $2, 40
add $4, $5, $6
lw $3, 300($0)
Instruction
fetch
Reg ALU
Data
access
Reg
2 ns
Instruction
fetch
Reg ALU
Data
access
Reg
2 ns
Program
execution
order
(in instructions)
Instruction
fetch
Reg ALU
Data
access
Reg
Time
beq $1, $2, 40
add $4, $5 ,$6
or $7, $8, $9
Instruction
fetch
Reg ALU
Data
access
Reg
2 4 6 8 10 12 14
2 4 6 8 10 12 14
Instruction
fetch
Reg ALU
Data
access
Reg
2 ns
4 ns
bubble bubble bubble bubble bubble
Program
execution
order
(in instructions)
Prediction success
Prediction failure: undo (=flush) lw 53

Control Hazards
• Solution 3 Delayed branch: always execute the sequentially next
statement with the branch executing after one instruction delay
– compiler’s job to find a statement that can be put in the slot
that is independent of branch outcome
o MIPS does this – but it is an option in SPIM (Simulator -> Settings)
Instruction
fetch
Reg ALU
Data
access
Reg
Time
beq $1, $2, 40
add $4, $5, $6
lw $3, 300($0)
Instruction
fetch
Reg ALU
Data
access
Reg
2 ns
Instruction
fetch
Reg ALU
Data
access
Reg
2 ns
2 4 6 8 10 12 14
2 ns
(d elayed branch slot)
Program
execution
order
(in instructions)
Delayed branch beq is followed by add that is
independent of branch outcome
54

Data Hazards
• Data hazard: instruction needs data from the result of a previous
instruction still executing in pipeline
• Solution Forward data if possible…
Time
2 4 6 8 10
add $s0, $t0, $t1 IF ID WB
EX MEM
add $s0, $t0, $t1
sub $t2, $s0, $t3
Program
execution
order
(in instructions)
IF ID WB
EX
IF ID MEM
EX
Time
2 4 6 8 10
MEM
WB
MEM
Instruction pipeline diagram:
shade indicates use –
left=write, right=read
Without forwarding – blue line –
data has to go back in time;
with forwarding – red line –
data is available in time
55

Data Hazards
• Forwarding may not be enough
o e.g., if an R-type instruction following a load uses the result of the
load – called load-use data hazard
Time
2 4 6 8 10 12 14
lw $s0, 20($t1)
sub $t2, $s0, $t3
Program
execution
order
(in instructions)
IF ID WB
MEM
EX
IF ID WB
MEM
EX
Time
2 4 6 8 10 12 14
lw $s0, 20($t1)
sub $t2, $s0, $t3
Program
execution
order
(in instructions)
IF ID WB
MEM
EX
IF ID WB
MEM
EX
bubble bubble bubble bubble bubble
With a one-stage stall, forwarding
can get the data to the sub
instruction in time
Without a stall it is impossible
to provide input to the sub
instruction in time
56

Reordering Code to Avoid
Pipeline Stall (Software
Solution)
• Example:
lw $t0, 0($t1)
lw $t2, 4($t1)
sw $t2, 0($t1)
sw $t0, 4($t1)
• Reordered code:
lw $t0, 0($t1)
lw $t2, 4($t1)
sw $t0, 4($t1)
sw $t2, 0($t1)
Data hazard
Interchanged
57

Pipelined Datapath
• We now move to actually building a pipelined datapath
• First recall the 5 steps in instruction execution
1. Instruction Fetch & PC Increment (IF)
2. Instruction Decode and Register Read (ID)
3. Execution or calculate address (EX)
4. Memory access (MEM)
5. Write result into register (WB)
• Review: single-cycle processor
o all 5 steps done in a single clock cycle
o dedicated hardware required for each step
• What happens if we break the execution into multiple cycles, but
keep the extra hardware?
58

Review - Single-Cycle
Datapath “Steps”
5 5
16
RD1
RD2
RN1 RN2 WN
WD
Register File ALU
E
X
T
N
D
16 32
RD
WD
Data
Memory
ADDR
5
Instruction I
32
M
U
X
<<2
RD
Instruction
Memory
ADDR
PC
4
ADD
ADD
M
U
X
32
IF
Instruction Fetch
ID
Instruction Decode
EX
Execute/ Address Calc.
MEM
Memory Access
WB
Write Back
Zero
59

Pipelined Datapath – Key
Idea
• What happens if we break the execution into multiple cycles, but keep the
extra hardware?
o Answer: We may be able to start executing a new instruction at each clock
cycle - pipelining
• …but we shall need extra registers to hold data between cycles –
pipeline registers
60

Pipelined Datapath
IF/ID
Pipeline registers
5 5
16
RD1
RD2
RN1 RN2 WN
WD
Register File ALU
E
X
T
N
D
16 32
RD
WD
Data
Memory
ADDR
5
Instruction I
32
M
U
X
<<2
RD
Instruction
Memory
ADDR
PC
4
ADD
ADD
M
U
X
32
ID/EX EX/MEM MEM/WB
Zero
64 bits
97 bits 64 bits
128 bits
wide enough to hold data coming in
61

Pipelined Datapath
IF/ID
Pipeline registers
5 5
16
RD1
RD2
RN1 RN2 WN
WD
Register File ALU
E
X
T
N
D
16 32
RD
WD
Data
Memory
ADDR
5
Instruction I
32
M
U
X
<<2
RD
Instruction
Memory
ADDR
PC
4
ADD
ADD
M
U
X
32
ID/EX EX/MEM MEM/WB
Zero
64 bits
97 bits 64 bits
128 bits
wide enough to hold data coming in
Only data flowing right to left may cause hazard…, why? 62

Bug in the Datapath
5 5
16
RD1
RD2
RN1 RN2 WN
WD
Register File ALU
E
X
T
N
D
16 32
RD
WD
Data
Memory
ADDR
5
Instruction I
32
M
U
X
<<2
RD
Instruction
Memory
ADDR
PC
4
ADD
ADD
M
U
X
32
Write register number comes from another later instruction!
IF/ID ID/EX EX/MEM MEM/WB
63

Corrected Datapath
5
RD1
RD2
RN1
RN2
WN
WD
Register
File
ALU
E
X
T
N
D
16 32
RD
WD
Data
Memory
ADDR
32
M
U
X
<<2
RD
Instruction
Memory
ADDR
PC
4
ADD
ADD
M
U
X
5
5
5
EX/MEM MEM/WB
Zero
ID/EX
IF/ID
64 bits 133 bits
102 bits 69 bits
Destination register number is also passed through ID/EX, EX/MEM
and MEM/WB registers, which are now wider by 5 bits 64

Pipelined Example
• Consider the following instruction sequence:
lw $t0, 10($t1)
sw $t3, 20($t4)
add $t5, $t6, $t7
sub $t8, $t9, $t10
65

Single-Clock-Cycle Diagram:
Clock Cycle 1
LW
66

Clock Cycle 2
LW
SW
67

Clock Cycle 3
LW
SW
ADD
68

Clock Cycle 4 LW
SW
ADD
SUB
69

Clock Cycle 5 LW
SW
ADD
SUB
70

Clock Cycle 6 SW
ADD
SUB
71

Clock Cycle 7 ADD
SUB
72

Clock Cycle 8 SUB
73

Multiple-Clock-Cycle
Diagram
IM REG ALU DM REG
lw $t0, 10($t1)
sw $t3, 20($t4)
add $t5, $t6, $t7
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7
IM REG ALU DM REG
IM REG ALU DM REG
sub $t8, $t9, $t10 IM REG ALU DM REG
CC 8
Time axis
74

Recall Single-Cycle
Control – the Datapath
PC
Instruction
memory
Read
address
Instruction
[31– 0]
Instruction [20 16]
Instruction [25 21]
Add
Instruction [5 0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
Branch
RegDst
ALUSrc
Instruction [31 26]
4
16 32
Instruction [15 0]
0
0
M
u
x
0
1
Control
Add
ALU
result
M
u
x
0
1
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
M
u
x
1
ALU
result
Zero
PCSrc
Data
memory
Write
data
Read
data
M
u
x
1
Instruction [15 11]
ALU
control
Shift
left 2
ALU
Address
75

Recall Single-Cycle – ALU Control
Instruction AluOp Instruction Funct Field Desired ALU control
opcode operation ALU action input
LW 00 load word xxxxxx add 010
SW 00 store word xxxxxx add 010
Branch eq 01 branch eq xxxxxx subtract 110
R-type 10 add 100000 add 010
R-type 10 subtract 100010 subtract 110
R-type 10 AND 100100 and 000
R-type 10 OR 100101 or 001
R-type 10 set on less 101010 set on less 111
Truth table for ALU control bits
ALUOp Funct field Operation
ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0 0 X X X X X X 010
0 1 X X X X X X 110
1 X X X 0 0 0 0 010
1 X X X 0 0 1 0 110
1 X X X 0 1 0 0 000
1 X X X 0 1 0 1 001
1 X X X 1 0 1 0 111 76

Signals
Signal Name Effect when deasserted Effect when asserted
RegDst The register destination number for the The register destination number for the
Write register comes from the rt field (bits 20-16) Write register comes from the rd field (bits 15-11)
RegWrite None The register on the Write register input is written
with the value on the Write data input
AlLUSrc The second ALU operand comes from the The second ALU operand is the sign-extended,
second register file output (Read data 2) lower 16 bits of the instruction
PCSrc The PC is replaced by the output of the adder The PC is replaced by the output of the adder
that computes the value of PC + 4 that computes the branch target
MemRead None Data memory contents designated by the address
input are put on the first Read data output
MemWrite None Data memory contents designated by the address
input are replaced by the value of the Write data input
MemtoReg The value fed to the register Write data input The value fed to the register Write data input
comes from the ALU comes from the data memory
Instruction RegDst ALUSrc
Memto-
Reg
Reg
Write
Mem
Read
Mem
Write Branch ALUOp1 ALUp0
R-format 1 0 0 1 0 0 0 1 0
lw 0 1 1 1 1 0 0 0 0
sw X 1 X 0 0 1 0 0 0
beq X 0 X 0 0 0 1 0 1
Effect of control bits
Deter-
mining
control
bits
77

Pipeline Control
• Initial design – motivated by single-cycle datapath control – use
the same control signals
• Observe:
o No separate write signal for the PC as it is written every cycle
o No separate write signals for the pipeline registers as they are written
every cycle
o No separate read signal for instruction memory as it is read every
clock cycle
o No separate read signal for register file as it is read every clock cycle
• Need to set control signals during each pipeline stage
• Since control signals are associated with components active
during a single pipeline stage, can group control lines into five
groups according to pipeline stage
Will be
modified
by hazard
detection
unit!!
78

Pipelined Datapath with
Control I
PC
Instruction
memory
Address
Instruction
Instruction
[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32
Instruction
[15– 0]
0
0
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
M
u
x
1
Write
data
Read
data M
u
x
1
ALU
control
RegWrite
MemRead
Instruction
[15– 11]
6
MemWrite
Address
Data
memory
PCSrc
Zero
Add
Add
result
Shift
left 2
ALU
result
ALU
Zero
Add
0
1
M
u
x
0
1
M
u
x
Same control
signals as the
single-cycle
datapath
79

Pipeline Control Signals
• There are five stages in the pipeline
o instruction fetch / PC increment
o instruction decode / register fetch
o execution / address calculation
o memory access
o write back
Execution/Address Calculation
stage control lines
Memory access stage
control lines
Write-back
stage control
lines
Instruction
Reg
Dst
ALU
Op1
ALU
Op0
ALU
Src Branch
Mem
Read
Mem
Write
Reg
write
Mem to
Reg
R-format 1 1 0 0 0 0 0 1 0
lw 0 0 0 1 0 1 0 1 1
sw X 0 0 1 0 0 1 0 X
beq X 0 1 0 1 0 0 0 X
Nothing to control as instruction memory
read and PC write are always enabled
80

Pipeline Control
Implementation
• Pass control signals along just like the data – extend each pipeline register to hold needed control bits for succeeding
stages
• Note: The 6-bit funct field of the instruction required in the EX stage to generate ALU control can be retrieved as the 6
least significant bits of the immediate field which is sign-extended and passed from the IF/ID register to the ID/EX
register
Control
EX
M
WB
M
WB
WB
Instruction
81

Pipelined Datapath with
Control II
PC
Instruction
memory
Instruction
Add
Instruction
[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32
Instruction
[15– 0]
0
0
M
u
x
0
1
Add
Add
result
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
M
u
x
1
ALU
result
Zero
Write
data
Read
data
M
u
x
1
ALU
control
Shift
left 2
RegWrite
MemRead
Control
ALU
Instruction
[15– 11]
6
EX
M
WB
M
WB
WB
IF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
M
u
x
0
1
MemWrite
Address
Data
memory
Address
Control signals
emanate from
the control
portions of the
pipeline registers
82

PROCESSOR AND CONTROL UNIT

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a PROCESSOR AND CONTROL UNIT

Similar a PROCESSOR AND CONTROL UNIT (20)

Último

Último (20)

PROCESSOR AND CONTROL UNIT