2. Improving Branch Performance
• Branch Elimination
– replace branch with other instructions
• Branch Speed Up
– reduce time for computing CC and TIF
• Branch Prediction
– guess the outcome and proceed, undo if necessary
• Branch Target Capture
– make use of history
slide 2
Anshul Kumar, CSE IITD
3. Branch Elimination
Use conditional instructions
F (predicated execution)
C
T
S C:S
OP1 OP1
BC CC = Z, ∗ + 2 ADD R3, R2, R1, NZ
ADD R3, R2, R1 OP2
OP2
slide 3
Anshul Kumar, CSE IITD
4. Branch Elimination - contd.
CC
IF IF IF D AG DF DF DF EX EX
OP1
IF IF IF D AG TIF TIF TIF
BC
IF IF IF D’ D AG
ADD/OP2
IF IF IF D AG DF DF DF EX EX
ADD
(cond)
slide 4
Anshul Kumar, CSE IITD
5. Improving Branch Performance
• Branch Elimination
– replace branch with other instructions
• Branch Speed Up
– reduce time for computing CC and TIF
• Branch Prediction
– guess the outcome and proceed, undo if necessary
• Branch Target Capture
– make use of history
slide 5
Anshul Kumar, CSE IITD
6. Branch Speed Up :
early target address generation
early target address generation
• Assume each instruction is Branch
• Generate target address while decoding
• If target in same page omit translation
• After decoding discard target address if not
Branch
IF IF IF D TIF TIF TIF
BC AG
slide 6
Anshul Kumar, CSE IITD
7. Branch Speed Up :
increase CC - branch gap
increase CC - branch gap
Increase the gap between condition checking
and branching
• Early CC setting
• Delayed branch
slide 7
Anshul Kumar, CSE IITD
8. Early CC setting: insert n instructions
insert n instructions
(branch taken)
(branch taken)
CC
IF IF D AG AG DF DF EX EX n=0
I-1
IF IF D AG AG TIF TIF
I
IF IF D’ D AG
T
IF IF’ D’ IF IF D
T+1
delay = 6
(Delay can be reduced with
larger target buffer)
slide 8
Anshul Kumar, CSE IITD
9. Early CC setting: insert n instructions
insert n instructions
CC
IF IF D AG AG DF DF EX EX n=1
I-1
IF IF D AG AG DF DF EX EX
J
IF IF D AG AG TIF TIF
I
IF IF D’ D AG
T
IF IF’ D’ IF IF D
T+1
delay = 5
slide 9
Anshul Kumar, CSE IITD
10. Early CC setting: insert n instructions
insert n instructions
CC
IF IF D AG AG DF DF EX EX n=2
I-1
IF IF D AG AG DF DF EX EX
J
IF IF D AG AG DF DF EX EX
K
IF IF D AG AG TIF TIF
I
IF IF D’ D AG
T
IF IF’ D’ IF IF D
T+1
delay = 4
slide 10
Anshul Kumar, CSE IITD
11. Early CC setting: insert n instructions
insert n instructions
CC
IF IF D AG AG DF DF EX EX n=3
I-1
IF IF D AG AG DF DF EX EX
J
IF IF D AG AG DF DF EX EX
K
IF IF D AG AG DF DF EX EX
L
IF IF D AG AG TIF TIF
I
IF IF D’ D AG
T
IF IF’ D’ IF IF D
T+1
delay = 4 slide 11
Anshul Kumar, CSE IITD
12. Early CC setting: insert n instructions
insert n instructions
(branch not taken)
(branch not taken)
CC
IF IF D AG AG DF DF EX EX n=0
I-1
IF IF D AG AG TIF TIF
I
IF IF D’ D AG
I+1
IF IF’ D’ IF D
I+2
delay = 5
slide 12
Anshul Kumar, CSE IITD
13. Early CC setting: insert n instructions
insert n instructions
CC
IF IF D AG AG DF DF EX EX n=1
I-1
IF IF D AG AG DF DF EX EX
J
IF IF D AG AG TIF TIF
I
IF IF D’ D AG
I+1
IF IF’ D’ IF D
I+2
delay = 4
slide 13
Anshul Kumar, CSE IITD
14. Early CC setting: insert n instructions
insert n instructions
CC
IF IF D AG AG DF DF EX EX n=2
I-1
IF IF D AG AG DF DF EX EX
J
IF IF D AG AG DF DF EX EX
K
IF IF D AG AG TIF TIF
I
IF IF D’ D AG
I+1
IF IF’ D’ IF D
I+2
delay = 3
slide 14
Anshul Kumar, CSE IITD
15. Early CC setting: insert n instructions
insert n instructions
CC
IF IF D AG AG DF DF EX EX n=3
I-1
IF IF D AG AG DF DF EX EX
J
IF IF D AG AG DF DF EX EX
K
IF IF D AG AG DF DF EX EX
L
IF IF D AG AG TIF TIF
I
IF IF D’ D AG
I+1
IF IF’ D’ IF D
I+2
delay = 2
slide 15
Anshul Kumar, CSE IITD
16. Delayed Branch: insert n instructions
insert n instructions
(branch taken)
(branch taken) CC
IF IF D AG AG DF DF EX EX n=0
I-1
IF IF D AG AG TIF TIF
I
IF IF D’ D AG
T
IF IF’ D’ IF IF D
T+1
delay = 6
slide 16
Anshul Kumar, CSE IITD
17. Delayed Branch : insert n instructions
insert n instructions
CC
IF IF D AG AG DF DF EX EX n=1
I-1
IF IF D AG AG TIF TIF
I
IF IF D AG AG DF DF EX EX
J
IF IF D’ D AG
T
IF IF’ D’ IF IF D
T+1
delay = 5
slide 17
Anshul Kumar, CSE IITD
18. Delayed Branch : insert n instructions
insert n instructions
CC
IF IF D AG AG DF DF EX EX n=2
I-1
IF IF D AG AG TIF TIF
I
IF IF D AG AG DF DF EX EX
J
IF IF D AG AG DF DF EX EX
K
IF IF D’ D AG
T
IF IF’ D’ IF IF D
T+1
delay = 4
slide 18
Anshul Kumar, CSE IITD
19. Delayed Branch : insert n instructions
insert n instructions
CC
IF IF D AG AG DF DF EX EX n=3
I-1
IF IF D AG AG TIF TIF
I
IF IF D AG AG DF DF EX EX
J
IF IF D AG AG DF DF EX EX
K
IF IF D AG AG DF DF EX EX
L
IF IF D’ D AG
T
IF IF’ D’ IF IF D
T+1
delay = 3
slide 19
Anshul Kumar, CSE IITD
20. Delayed Branch : insert n instructions
insert n instructions
(branch not taken)
(branch not taken) CC
IF IF D AG AG DF DF EX EX n=0
I-1
IF IF D AG AG TIF TIF
I
IF IF D’ D AG
I+1
IF IF’ D’ IF D
I+2
delay = 5
slide 20
Anshul Kumar, CSE IITD
21. Delayed Branch : insert n instructions
insert n instructions
CC
IF IF D AG AG DF DF EX EX n=1
I-1
IF IF D AG AG TIF TIF
I
IF IF D AG AG DF DF EX EX
J
IF IF D’ D AG
I+1
IF IF’ D’ IF D
I+2
delay = 4
slide 21
Anshul Kumar, CSE IITD
22. Delayed Branch : insert n instructions
insert n instructions
CC
IF IF D AG AG DF DF EX EX n=2
I-1
IF IF D AG AG TIF TIF
I
IF IF D AG AG DF DF EX EX
J
IF IF D AG AG DF DF EX EX
K
IF IF D’ D AG
I+1
IF IF’ D’ IF D
I+2
delay = 3
slide 22
Anshul Kumar, CSE IITD
23. Delayed Branch : insert n instructions
insert n instructions
CC
IF IF D AG AG DF DF EX EX n=3
I-1
IF IF D AG AG TIF TIF
I
IF IF D AG AG DF DF EX EX
J
IF IF D AG AG DF DF EX EX
K
IF IF D AG AG DF DF EX EX
L
IF IF D’ D AG
I+1
IF IF’ D’ IF D
I+2
delay = 2
slide 23
Anshul Kumar, CSE IITD
25. Improving Branch Performance
• Branch Elimination
– replace branch with other instructions
• Branch Speed Up
– reduce time for computing CC and TIF
• Branch Prediction
– guess the outcome and proceed, undo if necessary
• Branch Target Capture
– make use of history
slide 25
Anshul Kumar, CSE IITD
26. Branch Prediction
• Treat conditional branches as unconditional
branches / NOP
• Undo if necessary
Strategies:
– Fixed (always guess inline)
– Static (guess on the basis of instruction type)
– Dynamic (guess based on recent history)
slide 26
Anshul Kumar, CSE IITD
28. Branch Prediction
(guess inline, go inline)
(guess inline, go inline)
CC
IF IF D AG AG DF DF EX EX
I-1
IF IF D AG AG TIF TIF
I
IF IF D
I+1
IF IF D
I+2
delay = 0
slide 28
Anshul Kumar, CSE IITD
29. Branch Prediction
(guess inline, goto target)
(guess inline, goto target)
CC
IF IF D AG AG DF DF EX EX
I-1
IF IF D AG AG TIF TIF
I
IF IF D’ D AG
T
IF IF’ D’ IF IF D
T+1
delay = 6
slide 29
Anshul Kumar, CSE IITD
30. Branch Prediction
(guess target, go inline)
(guess target, go inline)
CC
IF IF D AG AG DF DF EX EX
I-1
IF IF D AG AG TIF TIF
I
D
T
D’ D
I+1
D’ D
I+2
delay = 5
slide 30
Anshul Kumar, CSE IITD
31. Branch Prediction
(guess target, goto target)
(guess target, goto target)
CC
IF IF D AG AG DF DF EX EX
I-1
IF IF D AG AG TIF TIF
I
IF IF D’ D AG
T
IF IF’ D’ IF IF D
T+1
delay = 4
Same as unconditional branch
slide 31
Anshul Kumar, CSE IITD
32. Static prediction strategy
Let p = probability of taking branch
guess target: delayt = 4 p + 5 (1 - p) = 5 - p
guess inline: delayi = 6 p + 0 (1 - p) = 6 p
⇒ if (delayt < delayi) guess target
else guess inline
(delayt < delayi) ⇒ 5 - p < 6 p
⇒ p > 5/7 = .71
slide 32
Anshul Kumar, CSE IITD
33. Static prediction strategy -
thresholds for different instructions
thresholds for different instructions
CC
IF IF D AG AG DF DF EX EX
I-1
IF IF D AG AG TIF TIF
I
→T I
actual
guess T 4 5
↓ I 60
guess target if 4 p + 5 (1 - p) < 6 p + 0 (1 - p)
i.e. p > .71
slide 33
Anshul Kumar, CSE IITD
34. Static prediction strategy -
thresholds for different instructions
thresholds for different instructions
CC
IF IF D AG AG DF DF EX EX
I-1
IF IF D AG AG TIF TIF EX EX
I
→T I
Loop control actual
guess T 4 6
↓ I 71
guess target if 4 p + 6 (1 - p) < 7 p + 1 (1 - p)
i.e. p > .62
slide 34
Anshul Kumar, CSE IITD
35. Static prediction strategy -
thresholds for different instructions
thresholds for different instructions
CC
IF IF D AG AG DF DF EX EX
I-1
IF IF D AG TIF TIF
I
→T I
register address actual
guess T 3 5
↓ I 60
guess target if 3 p + 5 (1 - p) < 6 p + 0 (1 - p)
i.e. p > .62
slide 35
Anshul Kumar, CSE IITD
36. Delayed Branch with Nullification
(Also called annulment )
• Delay slot is used optionally
• Branch instruction specifies the option
• Option may be exercised based on
correctness of branch prediction
• Helps in better utilization of delay slots
slide 36
Anshul Kumar, CSE IITD
37. Variants of Nullification
1.No annulment 2.Annul 3.Annul 4.Annul
if not taken If taken always
(branch-with-execute) (branch-or-skip) (branch-with-skip)
bc bc bc bc
D D D D D D D D
Examples
•SPARC: 1, 2
•MC88100: 1, 4
•i860: 2, 4
•HP PA: 1, 2, 3
slide 37
Anshul Kumar, CSE IITD
39. Dynamic Branch Prediction -
basic idea previous
Predict based on the history of
branch
loop: xxx 2 mispredictions
xxx for every
xxx occurrence
xxx
BC loop
slide 39
Anshul Kumar, CSE IITD
40. Dynamic Branch Prediction -
2 bit prediction scheme
N
0 1
T
3/2
0/1
T
N
T predict not taken
predict taken N
N
2 3
T
slide 40
Anshul Kumar, CSE IITD
41. Dynamic Branch Prediction -
Bimodal predictor
Bimodal predictor
Maintain saturating counters
T
T T
T
0 1 2 3
N
N
N N
slide 41
Anshul Kumar, CSE IITD
42. Dynamic Branch Prediction -
History of last n occurrences
History of last n occurrences
current entry updated entry
outcome of last
three occurrences actual outcome
of this branch ‘taken’
1 1 0 1 1 1
0 : not taken
1 : taken
prediction using
majority decision
slide 42
Anshul Kumar, CSE IITD
43. Dynamic Branch Prediction -
storing prediction counters
store in separate buffer or in cache directory
directory storage
CACHE
cache line
counter
One counter per branch or
One counter per cache line -
merge results if multiple branches
slide 43
Anshul Kumar, CSE IITD
44. Correct guesses vs. history length
Correct guesses vs. history length
n Compiler Business Scientific Supervisor
0 64.1 64.4 70.4 54.0
1 91.9 95.2 86.6 79.7
2 93.3 96.5 90.8 83.4
3 93.7 96.6 91.0 83.5
4 94.5 96.8 91.8 83.7
5 94.7 97.0 92.0 83.9
slide 44
Anshul Kumar, CSE IITD
45. Two-Level Prediction
Two-Level
• Uses two levels of information to make a
direction prediction
– Branch History Table (BHT) - last n
occurrences
– Pattern History Table (PHT) - saturating 2 bit
counters
• Captures patterned behavior of branches
– Groups of branches are correlated
– Particular branches have particular behavior
slide 45
Anshul Kumar, CSE IITD
46. Correlation between branches
• B3 can be predicted
B1: if (x)
with 100% accuracy
...
based on the outcomes
B2: if (y)
of B1 and B2
...
z = x && y
B3: if (z)
...
slide 46
Anshul Kumar, CSE IITD
47. Some Two-level Predictors
Two-level
PC
BHT
GBHR
PHT
PHT
10110
11010
T/NT
T/NT
01111
11100
00111
Local Predictor
Global Predictor
bits from PC and BHT can be combined to index PHT
slide 47
Anshul Kumar, CSE IITD
48. Two-level Predictor Classification
Two-level Predictor Classification
• Yeh and Patt 3-letter naming scheme
– Type of history collected
• G (global), P (per branch), S (per set)
– PHT type
• A (adaptive), S (static)
– PHT organization
• g (global), p (per branch), s (per set)
• Examples - GAs, PAp etc.
slide 48
Anshul Kumar, CSE IITD
49. Improving Branch Performance
• Branch Elimination
– replace branch with other instructions
• Branch Speed Up
– reduce time for computing CC and TIF
• Branch Prediction
– guess the outcome and proceed, undo if necessary
• Branch Target Capture
– make use of history
slide 49
Anshul Kumar, CSE IITD
51. BTB Performance
BTB miss BTB hit
decision
go inline .4 go to target
.6
result inline target inline target
.8 .2 .2 .8
delay 0 6 5 0
.4*.8*0 + .4*.2*6 + .6*.2*5 + .6*.8*0
= 1.08
slide 51
Anshul Kumar, CSE IITD
52. Dynamic information about branch
• Previous branch • Previous target address /
decisions instruction
• Explicit prediction • Implicit prediction
• Stored in cache • Stored in separate buffer
directory Branch Target Buffer (BTB)
Branch History Table (BHT) Br Target Addr Cache (BTAC)
Target Instr Buffer (TIB)
Br Target Instr Cache (BTIC)
These two can be combined
slide 52
Anshul Kumar, CSE IITD
53. Storing prediction info
directory storage
In cache
cache line
counter
In separate
buffer
instr addr pred stats target
slide 53
Anshul Kumar, CSE IITD
54. Combined prediction mechanism
• Explicit : use history bits
• Implicit : use BTB hit/miss
– hit ⇒ go to target, miss ⇒ go inline
• Combined : BTB hit/miss followed by
explicit prediction using history bits.
– commonly used :
hit ⇒ go to target, miss ⇒ explicit prediction
– alternatively :
miss ⇒ go inline, hit ⇒ explicit prediction
slide 54
Anshul Kumar, CSE IITD
55. Combined prediction
BTB miss BTB hit
BTB miss BTB hit
T
I
expl predict expl predict
I
I T T
I T
I T
I TI T I TI T
Prediction ⇒ T: Target, I: Inline Actual outcome ⇒ T: Target, I: Inline
slide 55
Anshul Kumar, CSE IITD
57. Compute/fetch scheme
(no dynamic branch prediction)
A I I+1 I+2 I+3
Instruction
I Fetch address
BTA F
IIFA A
I - cache
R
Compute
BTA +
Next sequential
address BTI BTI+1 BTI+2 BTI+3
slide 57
Anshul Kumar, CSE IITD
58. BHT (Branch History Table)
Instruction
Fetch address
2222
I-cache
128 x 4 lines 128 x 4
BHT
16 K
8 instr/line entries
4-way set assoc
2222
4 instr/cycle History bits
4 x 1 instr Prediction
decode queue
logic
issue queue 4 x 1 instr
Taken / not taken
BTA for a taken guess
slide 58
Anshul Kumar, CSE IITD
59. BTAC scheme
A I I+1 I+2 I+3
Instruction
I Fetch address BA BTA
BTA F
IIFA A
I - cache BTAC
R
+
Next sequential
address BTI BTI+1 BTI+2 BTI+3
slide 59
Anshul Kumar, CSE IITD
60. BTIC scheme - 1
A I
Instruction
I Fetch address BA BTI BTA+
BTA F
IIFA A
I - cache BTIC
R
+
Next sequential
address
To decoder
slide 60
Anshul Kumar, CSE IITD
61. BTIC scheme - 2
computed
A I I+1
Instruction
I Fetch address BA BTI BTI+1
BTA+ F
IIFA A
I - cache BTIC
R
+
Next sequential
address
To decoder
slide 61
Anshul Kumar, CSE IITD
62. References
1. M.J. Flynn, quot;Computer Architecture :
Pipelined and Parallel Processor Designquot;,
Narosa Publishing House/ Jones and Bartlett,
1996.
2. D. Sima, T. Fountain, P. Kacsuk, quot;Advanced
Computer Architectures : A Design Space
Approachquot;, Addison Wesley, 1997.
3. D.A. Patterson, J.L. Hennessy, quot;Computer
Architecture : A Quantitative Approachquot;,
Morgan CSE IITD
Kaufmann Publishers, 2006. slide 62
Anshul Kumar,