SlideShare una empresa de Scribd logo
1 de 62
Descargar para leer sin conexión
CSL718 : Pipelined Processors


    Improving Branch Performance
           22nd Jan, 2009


           Anshul Kumar, CSE IITD
Improving Branch Performance
• Branch Elimination
   – replace branch with other instructions
• Branch Speed Up
   – reduce time for computing CC and TIF
• Branch Prediction
   – guess the outcome and proceed, undo if necessary
• Branch Target Capture
   – make use of history



                                                        slide 2
Anshul Kumar, CSE IITD
Branch Elimination

                           Use conditional instructions
                       F     (predicated execution)
               C
                   T
               S                       C:S



   OP1                            OP1
   BC CC = Z, ∗ + 2               ADD R3, R2, R1, NZ
   ADD R3, R2, R1                 OP2
   OP2
                                                          slide 3
Anshul Kumar, CSE IITD
Branch Elimination - contd.
                                                         CC
       IF    IF   IF    D      AG DF DF DF EX EX
OP1
             IF   IF   IF      D    AG TIF TIF TIF
 BC
                  IF    IF     IF   D’               D   AG
 ADD/OP2


             IF   IF   IF      D    AG DF DF DF      EX EX
 ADD
 (cond)

                                                              slide 4
      Anshul Kumar, CSE IITD
Improving Branch Performance
• Branch Elimination
   – replace branch with other instructions
• Branch Speed Up
   – reduce time for computing CC and TIF
• Branch Prediction
   – guess the outcome and proceed, undo if necessary
• Branch Target Capture
   – make use of history



                                                        slide 5
Anshul Kumar, CSE IITD
Branch Speed Up :
      early target address generation
      early target address generation
•   Assume each instruction is Branch
•   Generate target address while decoding
•   If target in same page omit translation
•   After decoding discard target address if not
    Branch

           IF   IF   IF   D TIF TIF TIF
BC                        AG

                                              slide 6
Anshul Kumar, CSE IITD
Branch Speed Up :
          increase CC - branch gap
          increase CC - branch gap
Increase the gap between condition checking
  and branching
• Early CC setting
• Delayed branch




                                         slide 7
Anshul Kumar, CSE IITD
Early CC setting: insert n instructions
                  insert n instructions
                           (branch taken)
                           (branch taken)
                                                           CC
       IF    IF   D    AG AG DF DF EX EX                         n=0
I-1
             IF   IF   D       AG AG TIF TIF
I
                  IF    IF     D’                    D     AG
T
                        IF     IF’ D’                 IF    IF   D
T+1
                                         delay = 6

                                    (Delay can be reduced with
                                       larger target buffer)
                                                                 slide 8
      Anshul Kumar, CSE IITD
Early CC setting: insert n instructions
                  insert n instructions

                                                         CC
       IF    IF   D    AG AG DF DF EX EX                       n=1
I-1
             IF   IF   D       AG AG DF DF EX EX
J
                  IF    IF     D    AG AG TIF TIF
I
                        IF     IF   D’              D    AG
T
                               IF   IF’ D’          IF    IF   D
 T+1
                                             delay = 5

                                                               slide 9
      Anshul Kumar, CSE IITD
Early CC setting: insert n instructions
                      insert n instructions
                                                           CC
        IF   IF   D     AG AG DF DF EX EX                        n=2
I-1
             IF   IF    D    AG AG DF DF EX EX
J
                   IF   IF   D    AG AG DF DF EX EX
K
                        IF   IF   D    AG AG TIF TIF
    I
                             IF   IF   D’                   D    AG
    T
                                  IF   IF’ D’         IF    IF   D
    T+1
                                                delay = 4
                                                                     slide 10
        Anshul Kumar, CSE IITD
Early CC setting: insert n instructions
                      insert n instructions
                                                      CC
      IF   IF   D     AG AG DF DF EX EX                        n=3
I-1
           IF   IF    D    AG AG DF DF EX EX
J
                 IF   IF   D    AG AG DF DF EX EX
K
                      IF   IF   D    AG AG DF DF EX EX
L
                           IF   IF   D    AG AG TIF TIF
I
                                IF   IF   D’                   D      AG
 T
                                     IF   IF’ D’        IF     IF        D
    T+1
                                                   delay = 4        slide 11
      Anshul Kumar, CSE IITD
Early CC setting: insert n instructions
                  insert n instructions
                       (branch not taken)
                       (branch not taken)
                                                         CC
       IF    IF   D    AG AG DF DF EX EX                      n=0
I-1
             IF   IF   D       AG AG TIF TIF
I
                  IF    IF     D’                   D    AG
I+1
                        IF     IF’ D’               IF    D
I+2
                                        delay = 5



                                                              slide 12
      Anshul Kumar, CSE IITD
Early CC setting: insert n instructions
                  insert n instructions

                                                              CC
       IF    IF   D    AG AG DF DF EX EX                           n=1
I-1
             IF   IF   D       AG AG DF DF EX EX
J
                  IF    IF     D    AG AG TIF TIF
I
                        IF     IF   D’                   D    AG
I+1
                               IF   IF’ D’               IF    D
 I+2
                                             delay = 4

                                                                   slide 13
      Anshul Kumar, CSE IITD
Early CC setting: insert n instructions
                      insert n instructions
                                                          CC
          IF   IF   D     AG AG DF DF EX EX                    n=2
I-1
               IF   IF    D    AG AG DF DF EX EX
J
                     IF   IF   D    AG AG DF DF EX EX
K
                          IF   IF   D    AG AG TIF TIF
    I
                               IF   IF   D’          D    AG
    I+1
                                    IF   IF’ D’      IF    D
    I+2
                                              delay = 3
                                                                slide 14
          Anshul Kumar, CSE IITD
Early CC setting: insert n instructions
                      insert n instructions
                                                              CC
          IF   IF   D     AG AG DF DF EX EX                         n=3
I-1
               IF   IF    D    AG AG DF DF EX EX
J
                     IF   IF   D    AG AG DF DF EX EX
K
                          IF   IF   D    AG AG DF DF EX EX
L
                               IF   IF   D    AG AG TIF TIF
I
                                    IF   IF   D’        D      AG
 I+1
                                         IF   IF’ D’     IF    D
    I+2
                                                   delay = 2
                                                                     slide 15
          Anshul Kumar, CSE IITD
Delayed Branch: insert n instructions
                 insert n instructions
                           (branch taken)
                           (branch taken)                CC
       IF    IF   D    AG AG DF DF EX EX                       n=0
I-1
             IF   IF   D       AG AG TIF TIF
I
                  IF    IF     D’                   D    AG
T
                        IF     IF’ D’               IF    IF   D
T+1
                                        delay = 6



                                                               slide 16
      Anshul Kumar, CSE IITD
Delayed Branch : insert n instructions
                 insert n instructions
                                                         CC
       IF    IF   D    AG AG DF DF EX EX                       n=1
I-1
             IF   IF   D       AG AG TIF TIF
I
                  IF    IF     D    AG AG DF DF EX EX
J
                        IF     IF   D’             D     AG
T
                               IF   IF’ D’          IF    IF   D
    T+1
                                             delay = 5

                                                               slide 17
      Anshul Kumar, CSE IITD
Delayed Branch : insert n instructions
                     insert n instructions
                                                           CC
        IF   IF   D     AG AG DF DF EX EX                        n=2
I-1
             IF   IF    D    AG AG TIF TIF
I
                   IF   IF   D    AG AG DF DF EX EX
J
                        IF   IF   D    AG AG DF DF          EX EX
    K
                             IF   IF   D’            D      AG
    T
                                  IF   IF’ D’         IF    IF   D
    T+1
                                                delay = 4
                                                                     slide 18
        Anshul Kumar, CSE IITD
Delayed Branch : insert n instructions
                     insert n instructions
                                                      CC
      IF   IF   D     AG AG DF DF EX EX                    n=3
I-1
           IF   IF    D    AG AG TIF TIF
I
                 IF   IF   D    AG AG DF DF EX EX
J
                      IF   IF   D    AG AG DF DF EX EX
K
                           IF   IF   D    AG AG DF DF      EX EX
L
                                IF   IF   D’     D    AG
 T
                                     IF   IF’ D’ IF   IF   D
    T+1
                                               delay = 3
                                                               slide 19
      Anshul Kumar, CSE IITD
Delayed Branch : insert n instructions
                 insert n instructions
                       (branch not taken)
                       (branch not taken)                CC
       IF    IF   D    AG AG DF DF EX EX                      n=0
I-1
             IF   IF   D       AG AG TIF TIF
I
                  IF    IF     D’                   D    AG
I+1
                        IF     IF’ D’               IF    D
I+2
                                        delay = 5



                                                              slide 20
      Anshul Kumar, CSE IITD
Delayed Branch : insert n instructions
                 insert n instructions
                                                              CC
          IF   IF   D    AG AG DF DF EX EX                         n=1
I-1
               IF   IF   D     AG AG TIF TIF
I
                    IF   IF    D    AG AG DF DF          EX EX
J
                         IF    IF   D’                   D    AG
I+1
                               IF   IF’ D’               IF    D
    I+2
                                             delay = 4

                                                                   slide 21
      Anshul Kumar, CSE IITD
Delayed Branch : insert n instructions
                     insert n instructions
                                                          CC
          IF   IF   D     AG AG DF DF EX EX                    n=2
I-1
               IF   IF    D    AG AG TIF TIF
I
                     IF   IF   D    AG AG DF DF EX EX
J
                          IF   IF   D    AG AG DF DF      EX EX
    K
                               IF   IF   D’          D    AG
    I+1
                                    IF   IF’ D’      IF    D
    I+2
                                              delay = 3
                                                                slide 22
          Anshul Kumar, CSE IITD
Delayed Branch : insert n instructions
                     insert n instructions
                                                              CC
          IF   IF   D     AG AG DF DF EX EX                         n=3
I-1
               IF   IF    D    AG AG TIF TIF
I
                     IF   IF   D    AG AG DF DF EX EX
J
                          IF   IF   D    AG AG DF DF EX EX
K
                               IF   IF   D    AG AG DF DF           EX EX
L
                                    IF   IF   D’        D      AG
 I+1
                                         IF   IF’ D’     IF    D
    I+2
                                                   delay = 2
                                                                     slide 23
          Anshul Kumar, CSE IITD
Summary - Branch Speed Up

                              n=0   n=1   n=2   n=3   n=4   n=5
                   uncond     4     4     4     4     4     4
delayed early CC
branch setting




                   cond (T)   6     5     4     4     4     4
                   cond (I)   5     4     3     2     1     0
                   uncond     4     3     2     1     0     0
                   cond (T)   6     5     4     3     2     1
                   cond (I)   5     4     3     2     1     0
                                                             slide 24
     Anshul Kumar, CSE IITD
Improving Branch Performance
• Branch Elimination
   – replace branch with other instructions
• Branch Speed Up
   – reduce time for computing CC and TIF
• Branch Prediction
   – guess the outcome and proceed, undo if necessary
• Branch Target Capture
   – make use of history



                                                        slide 25
Anshul Kumar, CSE IITD
Branch Prediction
• Treat conditional branches as unconditional
  branches / NOP
• Undo if necessary
Strategies:
   – Fixed (always guess inline)
   – Static (guess on the basis of instruction type)
   – Dynamic (guess based on recent history)


                                                   slide 26
Anshul Kumar, CSE IITD
Prediction based on statistics

Instr     %       Branch Guess Correct Guess Correct

uncond 14.5 100%            always 14.5%    always 14.5%

cond      58      54%       never   27%     always 31%

loop      9.8     91%       always 9%       always 9%

call/ret 17.7 100%          always 17.7%    always 17.7%

          Total                     68.2%         72.2%
                                                  slide 27
   Anshul Kumar, CSE IITD
Branch Prediction
                   (guess inline, go inline)
                   (guess inline, go inline)
                                                   CC
       IF    IF   D    AG AG DF DF EX EX
I-1
             IF   IF   D       AG AG TIF TIF
I
                  IF    IF     D
I+1
                        IF     IF D
I+2
                                       delay = 0


                                                        slide 28
      Anshul Kumar, CSE IITD
Branch Prediction
                  (guess inline, goto target)
                  (guess inline, goto target)
                                                         CC
       IF    IF   D     AG AG DF DF EX EX
I-1
             IF   IF    D      AG AG TIF TIF
I
                   IF   IF     D’                   D    AG
T
                        IF     IF’ D’               IF    IF   D
T+1
                                        delay = 6


                                                               slide 29
      Anshul Kumar, CSE IITD
Branch Prediction
                  (guess target, go inline)
                  (guess target, go inline)
                                                         CC
       IF    IF   D    AG AG DF DF EX EX
I-1
             IF   IF   D       AG AG TIF TIF
I
                                               D
T
                               D’                    D
I+1
                                    D’                    D
I+2
                                         delay = 5
                                                              slide 30
      Anshul Kumar, CSE IITD
Branch Prediction
                  (guess target, goto target)
                  (guess target, goto target)
                                                             CC
       IF    IF   D     AG AG DF DF EX EX
I-1
             IF    IF   D      AG AG TIF TIF
I
                   IF   IF     D’                   D    AG
T
                        IF     IF’ D’        IF     IF   D
T+1
                                        delay = 4
                        Same as unconditional branch
                                                                  slide 31
      Anshul Kumar, CSE IITD
Static prediction strategy

Let p = probability of taking branch
guess target: delayt = 4 p + 5 (1 - p) = 5 - p
guess inline: delayi = 6 p + 0 (1 - p) = 6 p
⇒ if (delayt < delayi) guess target
  else guess inline
(delayt < delayi) ⇒ 5 - p < 6 p
                  ⇒ p > 5/7 = .71
                                                 slide 32
Anshul Kumar, CSE IITD
Static prediction strategy -
       thresholds for different instructions
       thresholds for different instructions
                                                   CC
       IF    IF   D    AG AG DF DF EX EX
I-1
             IF   IF   D       AG AG TIF TIF
I
                                        →T I
                               actual
                               guess T 4 5
                                  ↓ I 60
          guess target if 4 p + 5 (1 - p) < 6 p + 0 (1 - p)
                           i.e. p > .71
                                                          slide 33
      Anshul Kumar, CSE IITD
Static prediction strategy -
       thresholds for different instructions
       thresholds for different instructions
                                                   CC
       IF    IF   D    AG AG DF DF EX EX
I-1
             IF   IF   D       AG AG TIF TIF EX EX
 I
                                        →T I
Loop control                   actual
                               guess T 4 6
                                  ↓ I 71
          guess target if 4 p + 6 (1 - p) < 7 p + 1 (1 - p)
                           i.e. p > .62
                                                          slide 34
      Anshul Kumar, CSE IITD
Static prediction strategy -
       thresholds for different instructions
       thresholds for different instructions
                                                   CC
       IF    IF    D    AG AG DF DF EX EX
I-1
             IF    IF   D      AG TIF TIF
 I
                                        →T I
register address               actual
                               guess T 3 5
                                  ↓ I 60
          guess target if 3 p + 5 (1 - p) < 6 p + 0 (1 - p)
                            i.e. p > .62
                                                          slide 35
      Anshul Kumar, CSE IITD
Delayed Branch with Nullification
              (Also called annulment )
 •   Delay slot is used optionally
 •   Branch instruction specifies the option
 •   Option may be exercised based on
     correctness of branch prediction
 •   Helps in better utilization of delay slots



                                                  slide 36
 Anshul Kumar, CSE IITD
Variants of Nullification
 1.No annulment         2.Annul            3.Annul            4.Annul
                      if not taken         If taken           always
(branch-with-execute) (branch-or-skip) (branch-with-skip)


    bc                 bc                  bc                 bc
       D      D           D      D           D            D    D        D


                              Examples
                              •SPARC:           1, 2
                              •MC88100:         1, 4
                              •i860:            2, 4
                              •HP PA:           1, 2, 3
                                                                   slide 37
     Anshul Kumar, CSE IITD
Annulment illustration

use branch-or-skip                 use branch-with-skip


                              bc

                              D

                     bc

                     D

                                             slide 38
     Anshul Kumar, CSE IITD
Dynamic Branch Prediction -
                basic idea previous
Predict based on the history of
  branch
loop: xxx                2 mispredictions
      xxx                for every
      xxx                occurrence
      xxx
      BC loop

                                            slide 39
Anshul Kumar, CSE IITD
Dynamic Branch Prediction -
      2 bit prediction scheme

                                   N
                               0       1
                T
                                                         3/2
    0/1
                                   T
                                           N
                           T                       predict not taken
predict taken                      N
                                               N
                               2       3

                                   T

                                                            slide 40
  Anshul Kumar, CSE IITD
Dynamic Branch Prediction -
                   Bimodal predictor
                   Bimodal predictor

    Maintain saturating counters

                             T
                 T                     T
                                               T
           0             1         2       3
N

                             N
                  N                    N




                                               slide 41
Anshul Kumar, CSE IITD
Dynamic Branch Prediction -
                History of last n occurrences
                History of last n occurrences
                         current entry                     updated entry

outcome of last
three occurrences                         actual outcome
of this branch                                ‘taken’
                              1   1   0                     1    1       1
0 : not taken
1 : taken

                       prediction using
                       majority decision

                                                                     slide 42
     Anshul Kumar, CSE IITD
Dynamic Branch Prediction -
     storing prediction counters
          store in separate buffer or in cache directory
           directory               storage
CACHE

                                                       cache line



                   counter
   One counter per branch or
   One counter per cache line -
                      merge results if multiple branches
                                                           slide 43
   Anshul Kumar, CSE IITD
Correct guesses vs. history length
    Correct guesses vs. history length


n      Compiler Business Scientific    Supervisor
0      64.1              64.4   70.4   54.0
1      91.9              95.2   86.6   79.7
2      93.3              96.5   90.8   83.4
3      93.7              96.6   91.0   83.5
4      94.5              96.8   91.8   83.7
5      94.7              97.0   92.0   83.9

                                              slide 44
Anshul Kumar, CSE IITD
Two-Level Prediction
          Two-Level
• Uses two levels of information to make a
  direction prediction
    – Branch History Table (BHT) - last n
      occurrences
    – Pattern History Table (PHT) - saturating 2 bit
      counters
• Captures patterned behavior of branches
    – Groups of branches are correlated
    – Particular branches have particular behavior
                                                 slide 45
Anshul Kumar, CSE IITD
Correlation between branches
                         • B3 can be predicted
B1: if (x)
                           with 100% accuracy
       ...
                           based on the outcomes
B2: if (y)
                           of B1 and B2
       ...
    z = x && y
B3: if (z)
       ...



                                            slide 46
Anshul Kumar, CSE IITD
Some Two-level Predictors
              Two-level
                                   PC
                                        BHT
GBHR
                 PHT
                                                    PHT
10110
                                    11010
                            T/NT
                                                                       T/NT
                                    01111

                                    11100

                                    00111


                                          Local Predictor
        Global Predictor

        bits from PC and BHT can be combined to index PHT
                                                            slide 47
   Anshul Kumar, CSE IITD
Two-level Predictor Classification
Two-level Predictor Classification

• Yeh and Patt 3-letter naming scheme
   – Type of history collected
       • G (global), P (per branch), S (per set)
   – PHT type
       • A (adaptive), S (static)
   – PHT organization
       • g (global), p (per branch), s (per set)
• Examples - GAs, PAp etc.

                                                   slide 48
Anshul Kumar, CSE IITD
Improving Branch Performance
• Branch Elimination
   – replace branch with other instructions
• Branch Speed Up
   – reduce time for computing CC and TIF
• Branch Prediction
   – guess the outcome and proceed, undo if necessary
• Branch Target Capture
   – make use of history



                                                        slide 49
Anshul Kumar, CSE IITD
Branch Target Capture
           • Branch Target Buffer (BTB)
           • Target Instruction Buffer (TIB)




             instr addr   pred stats    target
                                                 target addr
  prob of target change < 5%
                                                 target instr
                                                      slide 50
Anshul Kumar, CSE IITD
BTB Performance


               BTB miss                BTB hit
decision
               go inline .4            go to target
                                 .6

result        inline      target inline   target

                    .8 .2             .2 .8

delay           0            6    5           0
           .4*.8*0 + .4*.2*6 + .6*.2*5 + .6*.8*0
           = 1.08
                                                      slide 51
 Anshul Kumar, CSE IITD
Dynamic information about branch
 • Previous branch            • Previous target address /
   decisions                    instruction
 • Explicit prediction        • Implicit prediction
 • Stored in cache            • Stored in separate buffer
   directory                  Branch Target Buffer (BTB)
 Branch History Table (BHT)   Br Target Addr Cache (BTAC)

                              Target Instr Buffer (TIB)
                              Br Target Instr Cache (BTIC)
                 These two can be combined
                                                     slide 52
  Anshul Kumar, CSE IITD
Storing prediction info
           directory                storage


In cache
                                                   cache line


                       counter


In separate
buffer

               instr addr    pred stats   target
                                                    slide 53
    Anshul Kumar, CSE IITD
Combined prediction mechanism
• Explicit : use history bits
• Implicit : use BTB hit/miss
    – hit ⇒ go to target, miss ⇒ go inline
• Combined : BTB hit/miss followed by
  explicit prediction using history bits.
    – commonly used :
      hit ⇒ go to target, miss ⇒ explicit prediction
    – alternatively :
      miss ⇒ go inline, hit ⇒ explicit prediction

                                                  slide 54
 Anshul Kumar, CSE IITD
Combined prediction
                                             BTB miss      BTB hit
      BTB miss      BTB hit
                                                             T
         I

                     expl predict           expl predict
                                            I
                     I          T                    T


                                                           I              T
  I           T


             I           TI         T   I       TI         T
Prediction ⇒ T: Target, I: Inline Actual outcome ⇒ T: Target, I: Inline
                                                               slide 55
       Anshul Kumar, CSE IITD
Structure of Tables
Instruction fetch path with
• BHT
• BTAC
• BTIC




                                  slide 56
Anshul Kumar, CSE IITD
Compute/fetch scheme
                 (no dynamic branch prediction)

                                             A    I     I+1       I+2 I+3
                              Instruction
                           I Fetch address
              BTA          F
              IIFA         A
                                                      I - cache
                           R
Compute
 BTA                       +
                  Next sequential
                     address                     BTI BTI+1 BTI+2 BTI+3




                                                                      slide 57
  Anshul Kumar, CSE IITD
BHT (Branch History Table)
                                                       Instruction
                                                      Fetch address
                                                    2222
                       I-cache
 128 x 4 lines                                                   128 x 4
                                                     BHT
                        16 K
  8 instr/line                                                   entries
                    4-way set assoc
                                                    2222
                            4 instr/cycle                        History bits

                                      4 x 1 instr   Prediction
    decode queue
                                                      logic
      issue queue                     4 x 1 instr

                                                 Taken / not taken
                                               BTA for a taken guess
                                                                           slide 58
Anshul Kumar, CSE IITD
BTAC scheme

                           A    I     I+1       I+2 I+3
            Instruction
         I Fetch address                                  BA   BTA
BTA      F
IIFA     A
                                    I - cache             BTAC
         R

         +
   Next sequential
      address                  BTI BTI+1 BTI+2 BTI+3




                                                           slide 59
Anshul Kumar, CSE IITD
BTIC scheme - 1

                           A     I
            Instruction
         I Fetch address                  BA    BTI      BTA+
BTA      F
IIFA     A
                           I - cache            BTIC
         R

         +
   Next sequential
      address
                                       To decoder




                                                       slide 60
Anshul Kumar, CSE IITD
BTIC scheme - 2

computed
                                   A     I         I+1
                    Instruction
                 I Fetch address                         BA    BTI     BTI+1
    BTA+         F
     IIFA        A
                                       I - cache              BTIC
                 R

                 +
           Next sequential
              address
                                                         To decoder




                                                                      slide 61
     Anshul Kumar, CSE IITD
References
1. M.J. Flynn, quot;Computer Architecture :
   Pipelined and Parallel Processor Designquot;,
   Narosa Publishing House/ Jones and Bartlett,
   1996.
2. D. Sima, T. Fountain, P. Kacsuk, quot;Advanced
   Computer Architectures : A Design Space
   Approachquot;, Addison Wesley, 1997.
3. D.A. Patterson, J.L. Hennessy, quot;Computer
   Architecture : A Quantitative Approachquot;,
   Morgan CSE IITD
                Kaufmann Publishers, 2006.  slide 62
  Anshul Kumar,

Más contenido relacionado

Más de Ravi Soni

Lec 5 Structure (Basics) 186
Lec 5  Structure (Basics) 186Lec 5  Structure (Basics) 186
Lec 5 Structure (Basics) 186Ravi Soni
 
Lec Jan15 2009
Lec Jan15 2009Lec Jan15 2009
Lec Jan15 2009Ravi Soni
 
Lec Jan29 2009
Lec Jan29 2009Lec Jan29 2009
Lec Jan29 2009Ravi Soni
 
Lec Feb05 2009
Lec Feb05 2009Lec Feb05 2009
Lec Feb05 2009Ravi Soni
 
Cs718min1 2008soln View
Cs718min1 2008soln ViewCs718min1 2008soln View
Cs718min1 2008soln ViewRavi Soni
 
Lec Feb09 2009
Lec Feb09 2009Lec Feb09 2009
Lec Feb09 2009Ravi Soni
 
Lec Jan19 2009
Lec Jan19 2009Lec Jan19 2009
Lec Jan19 2009Ravi Soni
 
Lec Feb02 2009
Lec Feb02 2009Lec Feb02 2009
Lec Feb02 2009Ravi Soni
 

Más de Ravi Soni (8)

Lec 5 Structure (Basics) 186
Lec 5  Structure (Basics) 186Lec 5  Structure (Basics) 186
Lec 5 Structure (Basics) 186
 
Lec Jan15 2009
Lec Jan15 2009Lec Jan15 2009
Lec Jan15 2009
 
Lec Jan29 2009
Lec Jan29 2009Lec Jan29 2009
Lec Jan29 2009
 
Lec Feb05 2009
Lec Feb05 2009Lec Feb05 2009
Lec Feb05 2009
 
Cs718min1 2008soln View
Cs718min1 2008soln ViewCs718min1 2008soln View
Cs718min1 2008soln View
 
Lec Feb09 2009
Lec Feb09 2009Lec Feb09 2009
Lec Feb09 2009
 
Lec Jan19 2009
Lec Jan19 2009Lec Jan19 2009
Lec Jan19 2009
 
Lec Feb02 2009
Lec Feb02 2009Lec Feb02 2009
Lec Feb02 2009
 

Lec Jan22 2009

  • 1. CSL718 : Pipelined Processors Improving Branch Performance 22nd Jan, 2009 Anshul Kumar, CSE IITD
  • 2. Improving Branch Performance • Branch Elimination – replace branch with other instructions • Branch Speed Up – reduce time for computing CC and TIF • Branch Prediction – guess the outcome and proceed, undo if necessary • Branch Target Capture – make use of history slide 2 Anshul Kumar, CSE IITD
  • 3. Branch Elimination Use conditional instructions F (predicated execution) C T S C:S OP1 OP1 BC CC = Z, ∗ + 2 ADD R3, R2, R1, NZ ADD R3, R2, R1 OP2 OP2 slide 3 Anshul Kumar, CSE IITD
  • 4. Branch Elimination - contd. CC IF IF IF D AG DF DF DF EX EX OP1 IF IF IF D AG TIF TIF TIF BC IF IF IF D’ D AG ADD/OP2 IF IF IF D AG DF DF DF EX EX ADD (cond) slide 4 Anshul Kumar, CSE IITD
  • 5. Improving Branch Performance • Branch Elimination – replace branch with other instructions • Branch Speed Up – reduce time for computing CC and TIF • Branch Prediction – guess the outcome and proceed, undo if necessary • Branch Target Capture – make use of history slide 5 Anshul Kumar, CSE IITD
  • 6. Branch Speed Up : early target address generation early target address generation • Assume each instruction is Branch • Generate target address while decoding • If target in same page omit translation • After decoding discard target address if not Branch IF IF IF D TIF TIF TIF BC AG slide 6 Anshul Kumar, CSE IITD
  • 7. Branch Speed Up : increase CC - branch gap increase CC - branch gap Increase the gap between condition checking and branching • Early CC setting • Delayed branch slide 7 Anshul Kumar, CSE IITD
  • 8. Early CC setting: insert n instructions insert n instructions (branch taken) (branch taken) CC IF IF D AG AG DF DF EX EX n=0 I-1 IF IF D AG AG TIF TIF I IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 6 (Delay can be reduced with larger target buffer) slide 8 Anshul Kumar, CSE IITD
  • 9. Early CC setting: insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=1 I-1 IF IF D AG AG DF DF EX EX J IF IF D AG AG TIF TIF I IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 5 slide 9 Anshul Kumar, CSE IITD
  • 10. Early CC setting: insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=2 I-1 IF IF D AG AG DF DF EX EX J IF IF D AG AG DF DF EX EX K IF IF D AG AG TIF TIF I IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 4 slide 10 Anshul Kumar, CSE IITD
  • 11. Early CC setting: insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=3 I-1 IF IF D AG AG DF DF EX EX J IF IF D AG AG DF DF EX EX K IF IF D AG AG DF DF EX EX L IF IF D AG AG TIF TIF I IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 4 slide 11 Anshul Kumar, CSE IITD
  • 12. Early CC setting: insert n instructions insert n instructions (branch not taken) (branch not taken) CC IF IF D AG AG DF DF EX EX n=0 I-1 IF IF D AG AG TIF TIF I IF IF D’ D AG I+1 IF IF’ D’ IF D I+2 delay = 5 slide 12 Anshul Kumar, CSE IITD
  • 13. Early CC setting: insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=1 I-1 IF IF D AG AG DF DF EX EX J IF IF D AG AG TIF TIF I IF IF D’ D AG I+1 IF IF’ D’ IF D I+2 delay = 4 slide 13 Anshul Kumar, CSE IITD
  • 14. Early CC setting: insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=2 I-1 IF IF D AG AG DF DF EX EX J IF IF D AG AG DF DF EX EX K IF IF D AG AG TIF TIF I IF IF D’ D AG I+1 IF IF’ D’ IF D I+2 delay = 3 slide 14 Anshul Kumar, CSE IITD
  • 15. Early CC setting: insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=3 I-1 IF IF D AG AG DF DF EX EX J IF IF D AG AG DF DF EX EX K IF IF D AG AG DF DF EX EX L IF IF D AG AG TIF TIF I IF IF D’ D AG I+1 IF IF’ D’ IF D I+2 delay = 2 slide 15 Anshul Kumar, CSE IITD
  • 16. Delayed Branch: insert n instructions insert n instructions (branch taken) (branch taken) CC IF IF D AG AG DF DF EX EX n=0 I-1 IF IF D AG AG TIF TIF I IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 6 slide 16 Anshul Kumar, CSE IITD
  • 17. Delayed Branch : insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=1 I-1 IF IF D AG AG TIF TIF I IF IF D AG AG DF DF EX EX J IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 5 slide 17 Anshul Kumar, CSE IITD
  • 18. Delayed Branch : insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=2 I-1 IF IF D AG AG TIF TIF I IF IF D AG AG DF DF EX EX J IF IF D AG AG DF DF EX EX K IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 4 slide 18 Anshul Kumar, CSE IITD
  • 19. Delayed Branch : insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=3 I-1 IF IF D AG AG TIF TIF I IF IF D AG AG DF DF EX EX J IF IF D AG AG DF DF EX EX K IF IF D AG AG DF DF EX EX L IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 3 slide 19 Anshul Kumar, CSE IITD
  • 20. Delayed Branch : insert n instructions insert n instructions (branch not taken) (branch not taken) CC IF IF D AG AG DF DF EX EX n=0 I-1 IF IF D AG AG TIF TIF I IF IF D’ D AG I+1 IF IF’ D’ IF D I+2 delay = 5 slide 20 Anshul Kumar, CSE IITD
  • 21. Delayed Branch : insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=1 I-1 IF IF D AG AG TIF TIF I IF IF D AG AG DF DF EX EX J IF IF D’ D AG I+1 IF IF’ D’ IF D I+2 delay = 4 slide 21 Anshul Kumar, CSE IITD
  • 22. Delayed Branch : insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=2 I-1 IF IF D AG AG TIF TIF I IF IF D AG AG DF DF EX EX J IF IF D AG AG DF DF EX EX K IF IF D’ D AG I+1 IF IF’ D’ IF D I+2 delay = 3 slide 22 Anshul Kumar, CSE IITD
  • 23. Delayed Branch : insert n instructions insert n instructions CC IF IF D AG AG DF DF EX EX n=3 I-1 IF IF D AG AG TIF TIF I IF IF D AG AG DF DF EX EX J IF IF D AG AG DF DF EX EX K IF IF D AG AG DF DF EX EX L IF IF D’ D AG I+1 IF IF’ D’ IF D I+2 delay = 2 slide 23 Anshul Kumar, CSE IITD
  • 24. Summary - Branch Speed Up n=0 n=1 n=2 n=3 n=4 n=5 uncond 4 4 4 4 4 4 delayed early CC branch setting cond (T) 6 5 4 4 4 4 cond (I) 5 4 3 2 1 0 uncond 4 3 2 1 0 0 cond (T) 6 5 4 3 2 1 cond (I) 5 4 3 2 1 0 slide 24 Anshul Kumar, CSE IITD
  • 25. Improving Branch Performance • Branch Elimination – replace branch with other instructions • Branch Speed Up – reduce time for computing CC and TIF • Branch Prediction – guess the outcome and proceed, undo if necessary • Branch Target Capture – make use of history slide 25 Anshul Kumar, CSE IITD
  • 26. Branch Prediction • Treat conditional branches as unconditional branches / NOP • Undo if necessary Strategies: – Fixed (always guess inline) – Static (guess on the basis of instruction type) – Dynamic (guess based on recent history) slide 26 Anshul Kumar, CSE IITD
  • 27. Prediction based on statistics Instr % Branch Guess Correct Guess Correct uncond 14.5 100% always 14.5% always 14.5% cond 58 54% never 27% always 31% loop 9.8 91% always 9% always 9% call/ret 17.7 100% always 17.7% always 17.7% Total 68.2% 72.2% slide 27 Anshul Kumar, CSE IITD
  • 28. Branch Prediction (guess inline, go inline) (guess inline, go inline) CC IF IF D AG AG DF DF EX EX I-1 IF IF D AG AG TIF TIF I IF IF D I+1 IF IF D I+2 delay = 0 slide 28 Anshul Kumar, CSE IITD
  • 29. Branch Prediction (guess inline, goto target) (guess inline, goto target) CC IF IF D AG AG DF DF EX EX I-1 IF IF D AG AG TIF TIF I IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 6 slide 29 Anshul Kumar, CSE IITD
  • 30. Branch Prediction (guess target, go inline) (guess target, go inline) CC IF IF D AG AG DF DF EX EX I-1 IF IF D AG AG TIF TIF I D T D’ D I+1 D’ D I+2 delay = 5 slide 30 Anshul Kumar, CSE IITD
  • 31. Branch Prediction (guess target, goto target) (guess target, goto target) CC IF IF D AG AG DF DF EX EX I-1 IF IF D AG AG TIF TIF I IF IF D’ D AG T IF IF’ D’ IF IF D T+1 delay = 4 Same as unconditional branch slide 31 Anshul Kumar, CSE IITD
  • 32. Static prediction strategy Let p = probability of taking branch guess target: delayt = 4 p + 5 (1 - p) = 5 - p guess inline: delayi = 6 p + 0 (1 - p) = 6 p ⇒ if (delayt < delayi) guess target else guess inline (delayt < delayi) ⇒ 5 - p < 6 p ⇒ p > 5/7 = .71 slide 32 Anshul Kumar, CSE IITD
  • 33. Static prediction strategy - thresholds for different instructions thresholds for different instructions CC IF IF D AG AG DF DF EX EX I-1 IF IF D AG AG TIF TIF I →T I actual guess T 4 5 ↓ I 60 guess target if 4 p + 5 (1 - p) < 6 p + 0 (1 - p) i.e. p > .71 slide 33 Anshul Kumar, CSE IITD
  • 34. Static prediction strategy - thresholds for different instructions thresholds for different instructions CC IF IF D AG AG DF DF EX EX I-1 IF IF D AG AG TIF TIF EX EX I →T I Loop control actual guess T 4 6 ↓ I 71 guess target if 4 p + 6 (1 - p) < 7 p + 1 (1 - p) i.e. p > .62 slide 34 Anshul Kumar, CSE IITD
  • 35. Static prediction strategy - thresholds for different instructions thresholds for different instructions CC IF IF D AG AG DF DF EX EX I-1 IF IF D AG TIF TIF I →T I register address actual guess T 3 5 ↓ I 60 guess target if 3 p + 5 (1 - p) < 6 p + 0 (1 - p) i.e. p > .62 slide 35 Anshul Kumar, CSE IITD
  • 36. Delayed Branch with Nullification (Also called annulment ) • Delay slot is used optionally • Branch instruction specifies the option • Option may be exercised based on correctness of branch prediction • Helps in better utilization of delay slots slide 36 Anshul Kumar, CSE IITD
  • 37. Variants of Nullification 1.No annulment 2.Annul 3.Annul 4.Annul if not taken If taken always (branch-with-execute) (branch-or-skip) (branch-with-skip) bc bc bc bc D D D D D D D D Examples •SPARC: 1, 2 •MC88100: 1, 4 •i860: 2, 4 •HP PA: 1, 2, 3 slide 37 Anshul Kumar, CSE IITD
  • 38. Annulment illustration use branch-or-skip use branch-with-skip bc D bc D slide 38 Anshul Kumar, CSE IITD
  • 39. Dynamic Branch Prediction - basic idea previous Predict based on the history of branch loop: xxx 2 mispredictions xxx for every xxx occurrence xxx BC loop slide 39 Anshul Kumar, CSE IITD
  • 40. Dynamic Branch Prediction - 2 bit prediction scheme N 0 1 T 3/2 0/1 T N T predict not taken predict taken N N 2 3 T slide 40 Anshul Kumar, CSE IITD
  • 41. Dynamic Branch Prediction - Bimodal predictor Bimodal predictor Maintain saturating counters T T T T 0 1 2 3 N N N N slide 41 Anshul Kumar, CSE IITD
  • 42. Dynamic Branch Prediction - History of last n occurrences History of last n occurrences current entry updated entry outcome of last three occurrences actual outcome of this branch ‘taken’ 1 1 0 1 1 1 0 : not taken 1 : taken prediction using majority decision slide 42 Anshul Kumar, CSE IITD
  • 43. Dynamic Branch Prediction - storing prediction counters store in separate buffer or in cache directory directory storage CACHE cache line counter One counter per branch or One counter per cache line - merge results if multiple branches slide 43 Anshul Kumar, CSE IITD
  • 44. Correct guesses vs. history length Correct guesses vs. history length n Compiler Business Scientific Supervisor 0 64.1 64.4 70.4 54.0 1 91.9 95.2 86.6 79.7 2 93.3 96.5 90.8 83.4 3 93.7 96.6 91.0 83.5 4 94.5 96.8 91.8 83.7 5 94.7 97.0 92.0 83.9 slide 44 Anshul Kumar, CSE IITD
  • 45. Two-Level Prediction Two-Level • Uses two levels of information to make a direction prediction – Branch History Table (BHT) - last n occurrences – Pattern History Table (PHT) - saturating 2 bit counters • Captures patterned behavior of branches – Groups of branches are correlated – Particular branches have particular behavior slide 45 Anshul Kumar, CSE IITD
  • 46. Correlation between branches • B3 can be predicted B1: if (x) with 100% accuracy ... based on the outcomes B2: if (y) of B1 and B2 ... z = x && y B3: if (z) ... slide 46 Anshul Kumar, CSE IITD
  • 47. Some Two-level Predictors Two-level PC BHT GBHR PHT PHT 10110 11010 T/NT T/NT 01111 11100 00111 Local Predictor Global Predictor bits from PC and BHT can be combined to index PHT slide 47 Anshul Kumar, CSE IITD
  • 48. Two-level Predictor Classification Two-level Predictor Classification • Yeh and Patt 3-letter naming scheme – Type of history collected • G (global), P (per branch), S (per set) – PHT type • A (adaptive), S (static) – PHT organization • g (global), p (per branch), s (per set) • Examples - GAs, PAp etc. slide 48 Anshul Kumar, CSE IITD
  • 49. Improving Branch Performance • Branch Elimination – replace branch with other instructions • Branch Speed Up – reduce time for computing CC and TIF • Branch Prediction – guess the outcome and proceed, undo if necessary • Branch Target Capture – make use of history slide 49 Anshul Kumar, CSE IITD
  • 50. Branch Target Capture • Branch Target Buffer (BTB) • Target Instruction Buffer (TIB) instr addr pred stats target target addr prob of target change < 5% target instr slide 50 Anshul Kumar, CSE IITD
  • 51. BTB Performance BTB miss BTB hit decision go inline .4 go to target .6 result inline target inline target .8 .2 .2 .8 delay 0 6 5 0 .4*.8*0 + .4*.2*6 + .6*.2*5 + .6*.8*0 = 1.08 slide 51 Anshul Kumar, CSE IITD
  • 52. Dynamic information about branch • Previous branch • Previous target address / decisions instruction • Explicit prediction • Implicit prediction • Stored in cache • Stored in separate buffer directory Branch Target Buffer (BTB) Branch History Table (BHT) Br Target Addr Cache (BTAC) Target Instr Buffer (TIB) Br Target Instr Cache (BTIC) These two can be combined slide 52 Anshul Kumar, CSE IITD
  • 53. Storing prediction info directory storage In cache cache line counter In separate buffer instr addr pred stats target slide 53 Anshul Kumar, CSE IITD
  • 54. Combined prediction mechanism • Explicit : use history bits • Implicit : use BTB hit/miss – hit ⇒ go to target, miss ⇒ go inline • Combined : BTB hit/miss followed by explicit prediction using history bits. – commonly used : hit ⇒ go to target, miss ⇒ explicit prediction – alternatively : miss ⇒ go inline, hit ⇒ explicit prediction slide 54 Anshul Kumar, CSE IITD
  • 55. Combined prediction BTB miss BTB hit BTB miss BTB hit T I expl predict expl predict I I T T I T I T I TI T I TI T Prediction ⇒ T: Target, I: Inline Actual outcome ⇒ T: Target, I: Inline slide 55 Anshul Kumar, CSE IITD
  • 56. Structure of Tables Instruction fetch path with • BHT • BTAC • BTIC slide 56 Anshul Kumar, CSE IITD
  • 57. Compute/fetch scheme (no dynamic branch prediction) A I I+1 I+2 I+3 Instruction I Fetch address BTA F IIFA A I - cache R Compute BTA + Next sequential address BTI BTI+1 BTI+2 BTI+3 slide 57 Anshul Kumar, CSE IITD
  • 58. BHT (Branch History Table) Instruction Fetch address 2222 I-cache 128 x 4 lines 128 x 4 BHT 16 K 8 instr/line entries 4-way set assoc 2222 4 instr/cycle History bits 4 x 1 instr Prediction decode queue logic issue queue 4 x 1 instr Taken / not taken BTA for a taken guess slide 58 Anshul Kumar, CSE IITD
  • 59. BTAC scheme A I I+1 I+2 I+3 Instruction I Fetch address BA BTA BTA F IIFA A I - cache BTAC R + Next sequential address BTI BTI+1 BTI+2 BTI+3 slide 59 Anshul Kumar, CSE IITD
  • 60. BTIC scheme - 1 A I Instruction I Fetch address BA BTI BTA+ BTA F IIFA A I - cache BTIC R + Next sequential address To decoder slide 60 Anshul Kumar, CSE IITD
  • 61. BTIC scheme - 2 computed A I I+1 Instruction I Fetch address BA BTI BTI+1 BTA+ F IIFA A I - cache BTIC R + Next sequential address To decoder slide 61 Anshul Kumar, CSE IITD
  • 62. References 1. M.J. Flynn, quot;Computer Architecture : Pipelined and Parallel Processor Designquot;, Narosa Publishing House/ Jones and Bartlett, 1996. 2. D. Sima, T. Fountain, P. Kacsuk, quot;Advanced Computer Architectures : A Design Space Approachquot;, Addison Wesley, 1997. 3. D.A. Patterson, J.L. Hennessy, quot;Computer Architecture : A Quantitative Approachquot;, Morgan CSE IITD Kaufmann Publishers, 2006. slide 62 Anshul Kumar,