SlideShare una empresa de Scribd logo
1 de 35
Parallel prefix
adders

Kostas Vitoroulis, 2006.
Presented to Dr. A. J. Al-Khalili.
Concordia University.
Overview of presentation
 Parallel prefix operations
 Binary addition as a parallel prefix
  operation
 Prefix graphs
 Adder topologies
 Summary
Parallel Prefix Operation
Terminology background:

   Prefix: The outcome of the operation depends on the initial inputs.

   Parallel: Involves the execution of an operation in parallel. This is
    done by segmentation into smaller pieces that are computed in
    parallel.

   Operation: Any arbitrary primitive operator “ ° ” that is associative
    is parallelizable
            it is fast because the processing is accomplished in a parallel fashion.
Example: Associative operations are parallelizable

Consider the logical OR operation: a + b
The operation is associative:
a + b + c + d = ((( a + b ) + c) + d ) = (( a + b ) + ( c + d))


   Serial implementation:                        Parallel implementation:
          a+b                                a             a+b
                                             b
                                                                      (a+b)+(c+d)
                     (a+b)+c
                                             c
                                             d             c+d
                               ((a+b)+c)+d
Mathematical Formulation: Prefix Sum
       Operator: “ ° ”              this is the unary operator
                                       known as “scan” or “prefix
                                       sum”
       Input is a vector:
        A = AnAn-1 … A1



       Output is another vector:
        B = BnBn-1 … B1
            where
        B1 = A1                        Bn represents the
        B2 = A1 ° A2                    operator being applied to
        …                               all terms of the vector.
        Bn = A1 ° A2 … ° An
Example of prefix sum
Consider the vector:       A = AnAn-1 … A1 where element Ai is an integer

The “*” unary operator, defined as:
                            *A = B
With
    B = BnBn-1 … B1

    B 1 = A1
    B2 = A1 * A2
    B 3 = A 1 * A 1 * A3
    …

and ‘ * ’ here is the integer addition operation.
Example of prefix sum
Calculation of *A, where A = 6 5 4 3 2 1 yields:
                               B = *A = 21 15 10 6 3 1

Because the summation is associative the calculation can be done in parallel in the
following manner:
           Parallel implementation        versus         Serial implementation
                6      5     4     3     2     1    6      5     4     3    2     1

                +            +           +                                  +

                             +     +
                                                    B = (A + A +3A3
                                                   B6 =3B61+…A1 + A2) =
                                                        A 2 = A11 = 12
                                                               +A +
                                                                1
                                                      = (A6 + A+) +
                +      +                              =6       5

                                                         ((A4+A3) +(A2 +A1))
                                                           +
                                                    += 21
                B6     B5   B4    B3    B2    B1    B6    B5    B4    B3    B2    B1
Binary Addition                                                           This is the pen and paper addition of
                                                                          two 4-bit binary numbers x and y.
                                                                          c represents the generated carries.
                                                                          s represents the produced sum bits.
               c3        c2        c1        c0
                x3            x2        x1        x0                      A stage of the addition is the set of
          +                                                               x and y bits being used to produce
                y3            y2        y1        y0                      the appropriate sum and carry bits.
          s4 s3               s2        s1        s0                      For example the highlighted bits x2,
                                                                          y2 constitute stage 2 which
                                                                          generates carry c2 and sum s2 .
Each stage i adds bits ai, bi, ci-1 and produces bits si, ci
The following hold:

     ai    bi       ci         Comment:                                     Formal definition:
     0     0        0          The stage “kills” an incoming carry.         “Kill” bit:          ki = xi + yi
     0     1        ci-1       The stage “propagates” an incoming carry     “Propagate” bit:
                                                                                                 pi = xi ⊕ yi
     1     0        ci-1       The stage “propagates” an incoming carry
     1     1        1          The stage “generates” a carry out            “Generate” bit:      g i = xi • yi
Binary Addition
         ai    bi    ci      Comment:                                   Formal definition:
         0     0     0       The stage “kills” an incoming carry.       “Kill” bit:          ki = xi + yi
         0     1     ci-1    The stage “propagates” an incoming carry   “Propagate” bit:
                                                                                             pi = xi ⊕ yi
         1     0     ci-1    The stage “propagates” an incoming carry
         1     1     1       The stage “generates” a carry out          “Generate” bit:      g i = xi • yi

The carry ci generated by a stage i is given by the equation:


               ci = g i + pi ⋅ ci −1 = xi ⋅ yi + ( xi ⊕ yi ) ⋅ ci −1

               ci = xi ⋅ yi + ( xi + yi ) ⋅ ci −1 = g i + ai ⋅ ci −1
This equation can be simplified to:




The “ai” term in the equation being the “alive” bit.
The later form of the equation uses an OR gate instead of an XOR which is a more efficient gate when implemented
in CMOS technology. Note that:

                                                    ai = ki
Where ki is the “kill” bit defined in the table above.
Carry Look Ahead adders
The CLA adder has the following 3-stage structure:

                 Pre-calculation of pi, gi for each stage


                 Calculation of carry ci for each stage.



                 Combine ci and pi of each stage to
                 generate the sum bits si


                               Final sum.
Carry Look Ahead adders
   The pre-calculation stage is implemented using the
    equations for pi, gi shown at a previous slide:
                                              x2y2                 x1y1                 x0y0




                                         g2          p2       g1          p1       g0          p0


   Alternatively using the “alive” bit:
                                              x2y2                 x1y1                 x0y0




                                         g2          a2       g1          a1       g0          a0

   Note the symmetry when we use the “propagate” or the “alive” bit… We can use them interchangeably in the equations!
Carry Look Ahead adders
   The carry calculation stage is implemented using the
    equations produced when unfolding the recursive
    equation:
                                  ci = g i + pi ⋅ ci −1 = g i + ai ⋅ ci −1



                                                                             g2 p2   g1 p1   g0 p0
    c0 = g 0
    c1 = g1 + p1 ⋅ g 0
    c2 = g 2 + p2 ⋅ c1 = g 2 + p2 ⋅ ( g1 + p1 ⋅ g 0 )                  Carry generator block
       = g 2 + p2 ⋅ g1 + p2 ⋅ p1 ⋅ g 0
        etc 
                                                                               c2       c1      c0
Carry Look Ahead adders
   The final sum calculation stage is implemented using the carry and
    propagate bits ci,pi:
                            si = pi ⊕ ci −1 , with pi = xi ⊕ yi
                            Note :
                            si = g i + ai ⋅ ci −1 , with ai = xi + yi

                               c2p3            c1p2            c0p1     cinp0




                                  s3              s2             s1       s0

   If the ‘alive’ bit ai is used the final sum stage becomes more
    complex as implied by the equations above.
Binary addition as a prefix sum problem.
     We define a new operator: “ ° ”
     Input is a vector of pairs of ‘propagate’ and ‘generate’ bits:
                            ( g n , pn )( g n−1 , pn−1 )  ( g 0 , p0 )

     Output is a new vectornofGn −1 , Pn −1 )  ( G0 , P0 )
                       ( Gn , P )( pairs:

     Each pair of the output vector is calculated by the
      following definition: i , Pi ) = ( g i , pi )  (Gi −1 , Pi −1 )
                         (G


                            Where :
                            (G0 , P0 ) = ( g 0 , p0 )
                             ( g x , px )  ( g y , p y ) = ( g x + px ⋅ g y , px ⋅ p y )
                             with +,⋅ being the OR, AND operations
Binary addition as a prefix sum problem.
     Properties of operator “ ° ”:
         Associativity (hence parallelization)
              Easy to prove based on the fact that the logical AND, OR
               operations are associative.
         With the definition:
                             (Gi , Pi ) = ( g i , pi )  (Gi −1 , Pi −1 )
                             Where (G1 , P ) = ( g1 , p1 )
                                          1

           Gi becomes the carry signal at stage i of an adder.                                               Illustration on
               next slide.

         The operation is idempotent
                                 ( g x , px )  ( g x , px ) = ( g x + px ⋅ g x , px ⋅ px ) = ( g x , px )
              Which implies
                        (Gi: j , Pi: j ) = (Gi:n , Pi:n )  (Gm: j , Pm: j )
                                   Where i ≥ j and                        m≥n
Binary Addition as a prefix sum problem.
                                                                              A stage i will generate a carry if
                                                                                             gi=aibi
              a3       a2       a1      a0                                    and propagate a carry if
         +                                                                              pi=XOR(ai,bi)
              b3 b2             b1      b0                                    Hence for stage i:
                                                                                         ci=gi+pici-1


                       With :                                       Where :
                   (Gi , Pi ) = ( g i , pi )  (Gi −1 , Pi −1 )   (G0 , P0 ) = ( g 0 , p0 )
  ( g x , px )  ( g y , p y ) = ( g x + px ⋅ g y , px ⋅ p y )
                                                                                                … The familiar
                We have :                                                                       carry bit generating
                 (G1 , P ) = ( g1 , p1 )
                        1                                                                       equations for stage i
                (G2 , P2 ) = ( g 2 , p2 )  (G1 , P ) = ( g 2 + p2 ⋅ g1 , p2 ⋅ p1 )
                                                   1
                                                                                                in a CLA adder.
                (G3 , P3 ) = ( g 3 , p3 )  (G2 , P2 ) = ( g 3 + p3 ⋅ ( g 2 + p2 ⋅ g1 ), p3 ⋅ p2 ⋅ p1 )
                              = ( g 3 + p3 ⋅ g 2 + p3 ⋅ p2 ⋅ g1 ), p3 ⋅ p2 ⋅ p1 )
                               etc 
Addition as a prefix sum problem.
Conclusion:

The equations of the well known CLA adder can be formulated as a parallel
prefix problem by employing a special operator “ ° ”.

This operator is associative hence it can be implemented in a parallel
fashion.

A Parallel Prefix Adder (PPA) is equivalent to the CLA adder… The two
differ in the way their carry generation block is implemented.

In subsequent slides we will see different topologies for the parallel
generation of carries. Adders that use these topologies are called Parallel
Prefix Adders.
Parallel Prefix Adders
   The parallel prefix adder employs the 3-stage structure
    of the CLA adder. The improvement is in the carry
    generation stage which is the most intensive one:
             Pre-calculation of Pi, Gi terms   Straight forward as
                                               in the CLA adder

              Calculation of the carries.      Prefix graphs
                                               can be used to
              This part is parallelizable to   describe the
                     reduce time.              structure that
                                               performs this
                                               part.


           Simple adder to generate the sum     Straight forward as
                                                in the CLA adder
Calculation of carries – Prefix
Graphs
The components usually seen in a prefix graph are the following:
             processing component:                                                buffer component:
                        (g   in1   , pin1   )                                              ( gin , pin )
                                                ( g in2 , pin2 )




    ( g out , pout )                                                   ( g out , pout )
                       ( g out , pout )                                                   ( g out , pout )

 ( g out , pout ) = ( g in
                         1
                             + pin 1 ⋅ g in2 , pin1 ⋅ pin 2        )             ( g out , pout ) = ( g in , pin )
Prefix graphs for representation of
Prefix addition
   Example: serial adder carry generation represented by prefix graphs
              (p8, g8) (p7, g7) (p6, g6) (p5, g5) (p4, g4) (p3, g3) (p2, g2) (p1, g1)




                c8       c7       c6       c5        c4      c3        c2       c1
Key architectures for carry calculation:
   1960:   J. Sklansky – conditional adder
   1973:   Kogge-Stone adder
   1980:   Ladner-Fisher adder
   1982:   Brent-Kung adder
   1987:   Han Carlson adder
   1999:   S. Knowles


Other parallel adder architectures:
   1981: H. Ling adder
   2001: Beaumont-Smith
1960: J. Sklansky – conditional adder
1960: J. Sklansky – conditional adder
              (p8, g8) (p7, g7) (p6, g6) (p5, g5) (p4, g4) (p3, g3) (p2, g2) (p1, g1)




                c8       c7       c6       c5        c4      c3        c2       c1




   The Sklansky adder has:
     Minimal depth
     High fan-out nodes
1973: Kogge-Stone adder
              (p8, g8) (p7, g7) (p6, g6) (p5, g5) (p4, g4) (p3, g3) (p2, g2) (p1, g1)




                c8       c7       c6       c5        c4      c3        c2       c1




   The Kogge-Stone adder has:
     Low depth
     High node count (implies more area).
     Minimal fan-out of 1 at each node (implies faster performance).
1980: Ladner-Fischer adder
                     (p8, g8) (p7, g7) (p6, g6) (p5, g5) (p4, g4) (p3, g3) (p2, g2) (p1, g1)




                       c8       c7       c6       c5        c4      c3        c2       c1




   The Ladner-Fischer adder has:
     Low depth
     High fan-out nodes
       This adder topology appears the same as the Schlanskly conditional sum adder. Ladner-Fischer formulated
        a parallel prefix network design space which included this minimal depth case. The actual adder they
        included as an application to their work had a structure that was slightly different than the above.
1982: Brent-Kung adder
              (p8, g8) (p7, g7) (p6, g6) (p5, g5) (p4, g4) (p3, g3) (p2, g2) (p1, g1)




                c8       c7       c6       c5        c4      c3        c2       c1


   The Brent-Kung adder is the extreme boundary case of:
     Maximum logic depth in PP adders (implies longer calculation
      time).
     Minimum number of nodes (implies minimum area).
1987: Han Carlson adder




   The Han-Carlson adder combines the Brent-Kung and
    Kogge-Stone structures into a hybrid structure.
     Efficient
     Suitable for VLSI implementation.
1999: S. Knowles
                                         Knowles proposed
                                          adders that trade off:
        Brent-Kung topology
        (Minimum fan-out)                  Depth, interconnect,
                                            area.
                                           These adders are
                                            bound by the
                        Knowles             Lander-Fischer
                        topologies
                        (Varied fan-out
                                            (minimum depth)
                        at each level )     and
                                            Brent-Kung (minimum
                       Ladner-Fischer       fanout) topologies.
                       topology
                       (Minimum depth, high
                       fanout)
An interesting taxonomy:
                      Harris[2003] presented an
                      interesting 3-D taxonomy of
                      the adders presented so far.
                      Each axis represents a
                      characteristic of the adders:
                      -Fanout
                      -Logic depth
                      -Wire connections

                       He also proposed the following
                      structure:
1981: H. Ling adder
 Ling Adders are a different family of adders.
 They can still be formulated as prefix adders.

 Ling adders differ from the “traditional” PP adders in that:
        They are based on a different set of equations.
        The new set of equations introduces the following tradeoffs:
                     Precalculation of Pi, Gi terms is based on more complex
                                              equations




                     Calculation of the carries is based
                          on simpler equations



                     Final addition stage is more
                     complex
Beaumont-Smith
                          7      6          5      4      3       2     1      0     G1:0=G1+P1G0
                                                                                     G2:1=G2+P2G1

                                                                                    G2:0=G2+P2G1+P2P1G0
   G7:0=G7:4+P7:4G4:0
                                                                                     G3:2=G3+P3G2
                                      6-5    5-4           3-2    2-1    1-0
    P7:4=P7P5P4           7-6
                                                                                    G3:1=G3+P3G2+P3P2G1
                          7-5         6-4                  3-1    2-0
                                                                                     G3:0=G3+P3G2+
                          7-4                              3-0
G7:4=G7+P7G6+P7P6G5+P7P6P5G4                                                         P3P2G1+P3P2P1G0

G7:5=G7+P7G6+P7P6G5                                                                  G4:0=G4+P4G3+P4P3G2
                                                                                     +P4P3P2G1+P4P3P2P1GO
                                      6-0    5-0    4-0
                          7-0                                                          G5:4=G5+P5G4

                                                                                       G5:0=G5:4+P5:4G4:0

                         c8      c7         c6     c5     c4     c3     c2     c1      P5:4=P5P4

                                                                                       G6:5=G6+P6G5
    G7:6=G7+P7G6                G6:0=G6+P6G5:4+P6P5:4G4:0 G6:4=G6+P6G5+P6P5G4
                                P5:4=P5P4
Summary (1/3)
   The parallel prefix formulation of binary addition
    is a very convenient way to formally describe an
    entire family of parallel binary adders.
Summary (2/3)
   A parallel prefix adder can be seen as a 3-stage process:

                                 Pre-calculation of Pi, Gi terms


                                  Calculation of the carries.


                              Simple adder to generate the sum


   There exist various architectures for the carry calculation part.
   Trade-offs in these architectures involve the
        area of the adder
        its depth
        the fan-out of the nodes
        the overall wiring network.
Summary (3/3)
   Variations of parallel adders have been
    proposed. These variations are based on:
     Modifying the carry generation equations and
      reformulating the prefix definition (Ling)
     Restructuring the carry calculation trees based by
      optimizing for a specific technology (Beaumond-
      Smith)
     Other optimizations.
References:
Beaumont-Smith, Cheng-Chew Lim, “Parallel Prefix Adder Design”, IEEE, 2001

Han, Carlson, “Fast Area-Efficient VLSI Adders, IEEE, 1987

Dimitrakopoulos, Nikolos, “High-Speed Parallel-Prefix VLSI Ling Adders”, IEEE 2005

Kogge, Stone, “A Parallel Algorithm for the Efficient solution of a General Class of Recurrence equations”, IEEE, 1973

Simon Knowles, “A Family of adders”, IEEE, 2001

Ladner, Fischer, “Parallel Prefix Computation”, ACM, 1980

Brent, Kung, “A regular Layout for Parallel Adders”, IEEE, 1982

H. Ling, “High-Speed Binary Adder”, IBM J. Res. And Dev., 1980

J. Sklansky, “Conditional-Sum Addition Logic”, IRE transactions on computers, 1960

D. Harris, “A Taxonomy of Parallel Prefix Networks”, IEEE, 2003

Más contenido relacionado

La actualidad más candente

Programmable Logic Devices Plds
Programmable Logic Devices PldsProgrammable Logic Devices Plds
Programmable Logic Devices PldsGaditek
 
MAC UNIT USING DIFFERENT MULTIPLIERS
MAC UNIT USING DIFFERENT MULTIPLIERSMAC UNIT USING DIFFERENT MULTIPLIERS
MAC UNIT USING DIFFERENT MULTIPLIERSBhamidipati Gayatri
 
Booths Multiplication Algorithm
Booths Multiplication AlgorithmBooths Multiplication Algorithm
Booths Multiplication Algorithmknightnick
 
8 bit Multiplier Accumulator
8 bit Multiplier Accumulator8 bit Multiplier Accumulator
8 bit Multiplier AccumulatorDaksh Raj Chopra
 
Quine Mc Clusky (Tabular) method
Quine Mc Clusky (Tabular) methodQuine Mc Clusky (Tabular) method
Quine Mc Clusky (Tabular) methodSyed Saeed
 
Solution manual 8051 microcontroller by mazidi
Solution manual 8051 microcontroller by mazidiSolution manual 8051 microcontroller by mazidi
Solution manual 8051 microcontroller by mazidiMuhammad Abdullah
 
Low power high_speed
Low power high_speedLow power high_speed
Low power high_speednanipandu
 
Combinational Circuits & Sequential Circuits
Combinational Circuits & Sequential CircuitsCombinational Circuits & Sequential Circuits
Combinational Circuits & Sequential Circuitsgourav kottawar
 
Microprocessor Interfacing and 8155 Features
Microprocessor Interfacing and 8155 FeaturesMicroprocessor Interfacing and 8155 Features
Microprocessor Interfacing and 8155 FeaturesSrikrishna Thota
 
RF Module Design - [Chapter 6] Power Amplifier
RF Module Design - [Chapter 6]  Power AmplifierRF Module Design - [Chapter 6]  Power Amplifier
RF Module Design - [Chapter 6] Power AmplifierSimen Li
 
Data flow model -Lecture-4
Data flow model -Lecture-4Data flow model -Lecture-4
Data flow model -Lecture-4Dr.YNM
 
MIPI DevCon 2016: A Developer's Guide to MIPI I3C Implementation
MIPI DevCon 2016: A Developer's Guide to MIPI I3C ImplementationMIPI DevCon 2016: A Developer's Guide to MIPI I3C Implementation
MIPI DevCon 2016: A Developer's Guide to MIPI I3C ImplementationMIPI Alliance
 
Design & implementation of high speed carry select adder
Design & implementation of high speed carry select adderDesign & implementation of high speed carry select adder
Design & implementation of high speed carry select adderssingh7603
 

La actualidad más candente (20)

Stick Diagram
Stick DiagramStick Diagram
Stick Diagram
 
Kogge Stone Adder
Kogge Stone AdderKogge Stone Adder
Kogge Stone Adder
 
Programmable Logic Devices Plds
Programmable Logic Devices PldsProgrammable Logic Devices Plds
Programmable Logic Devices Plds
 
MAC UNIT USING DIFFERENT MULTIPLIERS
MAC UNIT USING DIFFERENT MULTIPLIERSMAC UNIT USING DIFFERENT MULTIPLIERS
MAC UNIT USING DIFFERENT MULTIPLIERS
 
Booths Multiplication Algorithm
Booths Multiplication AlgorithmBooths Multiplication Algorithm
Booths Multiplication Algorithm
 
8 bit Multiplier Accumulator
8 bit Multiplier Accumulator8 bit Multiplier Accumulator
8 bit Multiplier Accumulator
 
Quine Mc Clusky (Tabular) method
Quine Mc Clusky (Tabular) methodQuine Mc Clusky (Tabular) method
Quine Mc Clusky (Tabular) method
 
Chapter 4: Combinational Logic
Chapter 4: Combinational LogicChapter 4: Combinational Logic
Chapter 4: Combinational Logic
 
Solution manual 8051 microcontroller by mazidi
Solution manual 8051 microcontroller by mazidiSolution manual 8051 microcontroller by mazidi
Solution manual 8051 microcontroller by mazidi
 
Low power high_speed
Low power high_speedLow power high_speed
Low power high_speed
 
Combinational Circuits & Sequential Circuits
Combinational Circuits & Sequential CircuitsCombinational Circuits & Sequential Circuits
Combinational Circuits & Sequential Circuits
 
Comparator
ComparatorComparator
Comparator
 
Multipliers in VLSI
Multipliers in VLSIMultipliers in VLSI
Multipliers in VLSI
 
Ripple Carry Adder
Ripple Carry AdderRipple Carry Adder
Ripple Carry Adder
 
Microprocessor Interfacing and 8155 Features
Microprocessor Interfacing and 8155 FeaturesMicroprocessor Interfacing and 8155 Features
Microprocessor Interfacing and 8155 Features
 
RF Module Design - [Chapter 6] Power Amplifier
RF Module Design - [Chapter 6]  Power AmplifierRF Module Design - [Chapter 6]  Power Amplifier
RF Module Design - [Chapter 6] Power Amplifier
 
Data flow model -Lecture-4
Data flow model -Lecture-4Data flow model -Lecture-4
Data flow model -Lecture-4
 
Multiplexers & Demultiplexers
Multiplexers & DemultiplexersMultiplexers & Demultiplexers
Multiplexers & Demultiplexers
 
MIPI DevCon 2016: A Developer's Guide to MIPI I3C Implementation
MIPI DevCon 2016: A Developer's Guide to MIPI I3C ImplementationMIPI DevCon 2016: A Developer's Guide to MIPI I3C Implementation
MIPI DevCon 2016: A Developer's Guide to MIPI I3C Implementation
 
Design & implementation of high speed carry select adder
Design & implementation of high speed carry select adderDesign & implementation of high speed carry select adder
Design & implementation of high speed carry select adder
 

Similar a Parallel Prefix Adders Presentation

Similar a Parallel Prefix Adders Presentation (20)

Complex numbers
Complex numbersComplex numbers
Complex numbers
 
3306565.ppt
3306565.ppt3306565.ppt
3306565.ppt
 
Complex numbers
Complex numbersComplex numbers
Complex numbers
 
Lesson 22: Quadratic Forms
Lesson 22: Quadratic FormsLesson 22: Quadratic Forms
Lesson 22: Quadratic Forms
 
5.1 quadratic functions
5.1 quadratic functions5.1 quadratic functions
5.1 quadratic functions
 
5HBC2012 Conic Worksheet
5HBC2012 Conic Worksheet5HBC2012 Conic Worksheet
5HBC2012 Conic Worksheet
 
Cs 601
Cs 601Cs 601
Cs 601
 
combinational-circuit (1).ppt
combinational-circuit (1).pptcombinational-circuit (1).ppt
combinational-circuit (1).ppt
 
Em09 cn
Em09 cnEm09 cn
Em09 cn
 
Number systems and Boolean Reduction
Number systems and Boolean ReductionNumber systems and Boolean Reduction
Number systems and Boolean Reduction
 
Logicgates
LogicgatesLogicgates
Logicgates
 
Lecture 18 M - Copy.pptx
Lecture 18 M - Copy.pptxLecture 18 M - Copy.pptx
Lecture 18 M - Copy.pptx
 
Math graphing quatratic equation
Math graphing quatratic equationMath graphing quatratic equation
Math graphing quatratic equation
 
Unit 4 dica
Unit 4 dicaUnit 4 dica
Unit 4 dica
 
Unexpected ineq
Unexpected ineqUnexpected ineq
Unexpected ineq
 
100 Inequalities Problems
100 Inequalities Problems100 Inequalities Problems
100 Inequalities Problems
 
Sect4 5
Sect4 5Sect4 5
Sect4 5
 
Inequalities marathon
Inequalities  marathonInequalities  marathon
Inequalities marathon
 
Applied 40S May 25, 2009
Applied 40S May 25, 2009Applied 40S May 25, 2009
Applied 40S May 25, 2009
 
AA Section 3-4
AA Section 3-4AA Section 3-4
AA Section 3-4
 

Más de Peeyush Pashine (16)

Temperature Controlled Fan Report
Temperature Controlled Fan ReportTemperature Controlled Fan Report
Temperature Controlled Fan Report
 
Temperature Controlled Fan
Temperature Controlled FanTemperature Controlled Fan
Temperature Controlled Fan
 
Robots
RobotsRobots
Robots
 
Power Ingredients
Power IngredientsPower Ingredients
Power Ingredients
 
Itms
ItmsItms
Itms
 
Ecg
EcgEcg
Ecg
 
Dsp Presentation
Dsp PresentationDsp Presentation
Dsp Presentation
 
Adder Presentation
Adder PresentationAdder Presentation
Adder Presentation
 
My Report on adders
My Report on addersMy Report on adders
My Report on adders
 
Decimal arithmetic in Processors
Decimal arithmetic in ProcessorsDecimal arithmetic in Processors
Decimal arithmetic in Processors
 
Control Unit Working
Control Unit WorkingControl Unit Working
Control Unit Working
 
Smith Adder
Smith AdderSmith Adder
Smith Adder
 
Smith Adder
Smith AdderSmith Adder
Smith Adder
 
Good report on Adders/Prefix adders
Good report on Adders/Prefix addersGood report on Adders/Prefix adders
Good report on Adders/Prefix adders
 
111adder
111adder111adder
111adder
 
Report adders
Report addersReport adders
Report adders
 

Parallel Prefix Adders Presentation

  • 1. Parallel prefix adders Kostas Vitoroulis, 2006. Presented to Dr. A. J. Al-Khalili. Concordia University.
  • 2. Overview of presentation  Parallel prefix operations  Binary addition as a parallel prefix operation  Prefix graphs  Adder topologies  Summary
  • 3. Parallel Prefix Operation Terminology background:  Prefix: The outcome of the operation depends on the initial inputs.  Parallel: Involves the execution of an operation in parallel. This is done by segmentation into smaller pieces that are computed in parallel.  Operation: Any arbitrary primitive operator “ ° ” that is associative is parallelizable  it is fast because the processing is accomplished in a parallel fashion.
  • 4. Example: Associative operations are parallelizable Consider the logical OR operation: a + b The operation is associative: a + b + c + d = ((( a + b ) + c) + d ) = (( a + b ) + ( c + d)) Serial implementation: Parallel implementation: a+b a a+b b (a+b)+(c+d) (a+b)+c c d c+d ((a+b)+c)+d
  • 5. Mathematical Formulation: Prefix Sum  Operator: “ ° ”  this is the unary operator known as “scan” or “prefix sum”  Input is a vector: A = AnAn-1 … A1  Output is another vector: B = BnBn-1 … B1 where B1 = A1  Bn represents the B2 = A1 ° A2 operator being applied to … all terms of the vector. Bn = A1 ° A2 … ° An
  • 6. Example of prefix sum Consider the vector: A = AnAn-1 … A1 where element Ai is an integer The “*” unary operator, defined as: *A = B With B = BnBn-1 … B1 B 1 = A1 B2 = A1 * A2 B 3 = A 1 * A 1 * A3 … and ‘ * ’ here is the integer addition operation.
  • 7. Example of prefix sum Calculation of *A, where A = 6 5 4 3 2 1 yields: B = *A = 21 15 10 6 3 1 Because the summation is associative the calculation can be done in parallel in the following manner: Parallel implementation versus Serial implementation 6 5 4 3 2 1 6 5 4 3 2 1 + + + + + + B = (A + A +3A3 B6 =3B61+…A1 + A2) = A 2 = A11 = 12 +A + 1 = (A6 + A+) + + + =6 5 ((A4+A3) +(A2 +A1)) + += 21 B6 B5 B4 B3 B2 B1 B6 B5 B4 B3 B2 B1
  • 8. Binary Addition This is the pen and paper addition of two 4-bit binary numbers x and y. c represents the generated carries. s represents the produced sum bits. c3 c2 c1 c0 x3 x2 x1 x0 A stage of the addition is the set of + x and y bits being used to produce y3 y2 y1 y0 the appropriate sum and carry bits. s4 s3 s2 s1 s0 For example the highlighted bits x2, y2 constitute stage 2 which generates carry c2 and sum s2 . Each stage i adds bits ai, bi, ci-1 and produces bits si, ci The following hold: ai bi ci Comment: Formal definition: 0 0 0 The stage “kills” an incoming carry. “Kill” bit: ki = xi + yi 0 1 ci-1 The stage “propagates” an incoming carry “Propagate” bit: pi = xi ⊕ yi 1 0 ci-1 The stage “propagates” an incoming carry 1 1 1 The stage “generates” a carry out “Generate” bit: g i = xi • yi
  • 9. Binary Addition ai bi ci Comment: Formal definition: 0 0 0 The stage “kills” an incoming carry. “Kill” bit: ki = xi + yi 0 1 ci-1 The stage “propagates” an incoming carry “Propagate” bit: pi = xi ⊕ yi 1 0 ci-1 The stage “propagates” an incoming carry 1 1 1 The stage “generates” a carry out “Generate” bit: g i = xi • yi The carry ci generated by a stage i is given by the equation: ci = g i + pi ⋅ ci −1 = xi ⋅ yi + ( xi ⊕ yi ) ⋅ ci −1 ci = xi ⋅ yi + ( xi + yi ) ⋅ ci −1 = g i + ai ⋅ ci −1 This equation can be simplified to: The “ai” term in the equation being the “alive” bit. The later form of the equation uses an OR gate instead of an XOR which is a more efficient gate when implemented in CMOS technology. Note that: ai = ki Where ki is the “kill” bit defined in the table above.
  • 10. Carry Look Ahead adders The CLA adder has the following 3-stage structure: Pre-calculation of pi, gi for each stage Calculation of carry ci for each stage. Combine ci and pi of each stage to generate the sum bits si Final sum.
  • 11. Carry Look Ahead adders  The pre-calculation stage is implemented using the equations for pi, gi shown at a previous slide: x2y2 x1y1 x0y0 g2 p2 g1 p1 g0 p0  Alternatively using the “alive” bit: x2y2 x1y1 x0y0 g2 a2 g1 a1 g0 a0  Note the symmetry when we use the “propagate” or the “alive” bit… We can use them interchangeably in the equations!
  • 12. Carry Look Ahead adders  The carry calculation stage is implemented using the equations produced when unfolding the recursive equation: ci = g i + pi ⋅ ci −1 = g i + ai ⋅ ci −1 g2 p2 g1 p1 g0 p0 c0 = g 0 c1 = g1 + p1 ⋅ g 0 c2 = g 2 + p2 ⋅ c1 = g 2 + p2 ⋅ ( g1 + p1 ⋅ g 0 ) Carry generator block = g 2 + p2 ⋅ g1 + p2 ⋅ p1 ⋅ g 0 etc  c2 c1 c0
  • 13. Carry Look Ahead adders  The final sum calculation stage is implemented using the carry and propagate bits ci,pi: si = pi ⊕ ci −1 , with pi = xi ⊕ yi Note : si = g i + ai ⋅ ci −1 , with ai = xi + yi c2p3 c1p2 c0p1 cinp0 s3 s2 s1 s0  If the ‘alive’ bit ai is used the final sum stage becomes more complex as implied by the equations above.
  • 14. Binary addition as a prefix sum problem.  We define a new operator: “ ° ”  Input is a vector of pairs of ‘propagate’ and ‘generate’ bits: ( g n , pn )( g n−1 , pn−1 )  ( g 0 , p0 )  Output is a new vectornofGn −1 , Pn −1 )  ( G0 , P0 ) ( Gn , P )( pairs:  Each pair of the output vector is calculated by the following definition: i , Pi ) = ( g i , pi )  (Gi −1 , Pi −1 ) (G Where : (G0 , P0 ) = ( g 0 , p0 ) ( g x , px )  ( g y , p y ) = ( g x + px ⋅ g y , px ⋅ p y ) with +,⋅ being the OR, AND operations
  • 15. Binary addition as a prefix sum problem.  Properties of operator “ ° ”:  Associativity (hence parallelization)  Easy to prove based on the fact that the logical AND, OR operations are associative.  With the definition: (Gi , Pi ) = ( g i , pi )  (Gi −1 , Pi −1 ) Where (G1 , P ) = ( g1 , p1 ) 1 Gi becomes the carry signal at stage i of an adder. Illustration on next slide.  The operation is idempotent ( g x , px )  ( g x , px ) = ( g x + px ⋅ g x , px ⋅ px ) = ( g x , px )  Which implies (Gi: j , Pi: j ) = (Gi:n , Pi:n )  (Gm: j , Pm: j ) Where i ≥ j and m≥n
  • 16. Binary Addition as a prefix sum problem. A stage i will generate a carry if gi=aibi a3 a2 a1 a0 and propagate a carry if + pi=XOR(ai,bi) b3 b2 b1 b0 Hence for stage i: ci=gi+pici-1 With : Where : (Gi , Pi ) = ( g i , pi )  (Gi −1 , Pi −1 ) (G0 , P0 ) = ( g 0 , p0 ) ( g x , px )  ( g y , p y ) = ( g x + px ⋅ g y , px ⋅ p y ) … The familiar We have : carry bit generating (G1 , P ) = ( g1 , p1 ) 1 equations for stage i (G2 , P2 ) = ( g 2 , p2 )  (G1 , P ) = ( g 2 + p2 ⋅ g1 , p2 ⋅ p1 ) 1 in a CLA adder. (G3 , P3 ) = ( g 3 , p3 )  (G2 , P2 ) = ( g 3 + p3 ⋅ ( g 2 + p2 ⋅ g1 ), p3 ⋅ p2 ⋅ p1 ) = ( g 3 + p3 ⋅ g 2 + p3 ⋅ p2 ⋅ g1 ), p3 ⋅ p2 ⋅ p1 ) etc 
  • 17. Addition as a prefix sum problem. Conclusion: The equations of the well known CLA adder can be formulated as a parallel prefix problem by employing a special operator “ ° ”. This operator is associative hence it can be implemented in a parallel fashion. A Parallel Prefix Adder (PPA) is equivalent to the CLA adder… The two differ in the way their carry generation block is implemented. In subsequent slides we will see different topologies for the parallel generation of carries. Adders that use these topologies are called Parallel Prefix Adders.
  • 18. Parallel Prefix Adders  The parallel prefix adder employs the 3-stage structure of the CLA adder. The improvement is in the carry generation stage which is the most intensive one: Pre-calculation of Pi, Gi terms Straight forward as in the CLA adder Calculation of the carries. Prefix graphs can be used to This part is parallelizable to describe the reduce time. structure that performs this part. Simple adder to generate the sum Straight forward as in the CLA adder
  • 19. Calculation of carries – Prefix Graphs The components usually seen in a prefix graph are the following: processing component: buffer component: (g in1 , pin1 ) ( gin , pin ) ( g in2 , pin2 ) ( g out , pout ) ( g out , pout ) ( g out , pout ) ( g out , pout ) ( g out , pout ) = ( g in 1 + pin 1 ⋅ g in2 , pin1 ⋅ pin 2 ) ( g out , pout ) = ( g in , pin )
  • 20. Prefix graphs for representation of Prefix addition  Example: serial adder carry generation represented by prefix graphs (p8, g8) (p7, g7) (p6, g6) (p5, g5) (p4, g4) (p3, g3) (p2, g2) (p1, g1) c8 c7 c6 c5 c4 c3 c2 c1
  • 21. Key architectures for carry calculation:  1960: J. Sklansky – conditional adder  1973: Kogge-Stone adder  1980: Ladner-Fisher adder  1982: Brent-Kung adder  1987: Han Carlson adder  1999: S. Knowles Other parallel adder architectures:  1981: H. Ling adder  2001: Beaumont-Smith
  • 22. 1960: J. Sklansky – conditional adder
  • 23. 1960: J. Sklansky – conditional adder (p8, g8) (p7, g7) (p6, g6) (p5, g5) (p4, g4) (p3, g3) (p2, g2) (p1, g1) c8 c7 c6 c5 c4 c3 c2 c1  The Sklansky adder has:  Minimal depth  High fan-out nodes
  • 24. 1973: Kogge-Stone adder (p8, g8) (p7, g7) (p6, g6) (p5, g5) (p4, g4) (p3, g3) (p2, g2) (p1, g1) c8 c7 c6 c5 c4 c3 c2 c1  The Kogge-Stone adder has:  Low depth  High node count (implies more area).  Minimal fan-out of 1 at each node (implies faster performance).
  • 25. 1980: Ladner-Fischer adder (p8, g8) (p7, g7) (p6, g6) (p5, g5) (p4, g4) (p3, g3) (p2, g2) (p1, g1) c8 c7 c6 c5 c4 c3 c2 c1  The Ladner-Fischer adder has:  Low depth  High fan-out nodes  This adder topology appears the same as the Schlanskly conditional sum adder. Ladner-Fischer formulated a parallel prefix network design space which included this minimal depth case. The actual adder they included as an application to their work had a structure that was slightly different than the above.
  • 26. 1982: Brent-Kung adder (p8, g8) (p7, g7) (p6, g6) (p5, g5) (p4, g4) (p3, g3) (p2, g2) (p1, g1) c8 c7 c6 c5 c4 c3 c2 c1  The Brent-Kung adder is the extreme boundary case of:  Maximum logic depth in PP adders (implies longer calculation time).  Minimum number of nodes (implies minimum area).
  • 27. 1987: Han Carlson adder  The Han-Carlson adder combines the Brent-Kung and Kogge-Stone structures into a hybrid structure.  Efficient  Suitable for VLSI implementation.
  • 28. 1999: S. Knowles  Knowles proposed adders that trade off: Brent-Kung topology (Minimum fan-out)  Depth, interconnect, area.  These adders are bound by the Knowles Lander-Fischer topologies (Varied fan-out (minimum depth) at each level ) and Brent-Kung (minimum Ladner-Fischer fanout) topologies. topology (Minimum depth, high fanout)
  • 29. An interesting taxonomy: Harris[2003] presented an interesting 3-D taxonomy of the adders presented so far. Each axis represents a characteristic of the adders: -Fanout -Logic depth -Wire connections He also proposed the following structure:
  • 30. 1981: H. Ling adder Ling Adders are a different family of adders. They can still be formulated as prefix adders. Ling adders differ from the “traditional” PP adders in that:  They are based on a different set of equations.  The new set of equations introduces the following tradeoffs: Precalculation of Pi, Gi terms is based on more complex equations Calculation of the carries is based on simpler equations Final addition stage is more complex
  • 31. Beaumont-Smith 7 6 5 4 3 2 1 0 G1:0=G1+P1G0 G2:1=G2+P2G1 G2:0=G2+P2G1+P2P1G0 G7:0=G7:4+P7:4G4:0 G3:2=G3+P3G2 6-5 5-4 3-2 2-1 1-0 P7:4=P7P5P4 7-6 G3:1=G3+P3G2+P3P2G1 7-5 6-4 3-1 2-0 G3:0=G3+P3G2+ 7-4 3-0 G7:4=G7+P7G6+P7P6G5+P7P6P5G4 P3P2G1+P3P2P1G0 G7:5=G7+P7G6+P7P6G5 G4:0=G4+P4G3+P4P3G2 +P4P3P2G1+P4P3P2P1GO 6-0 5-0 4-0 7-0 G5:4=G5+P5G4 G5:0=G5:4+P5:4G4:0 c8 c7 c6 c5 c4 c3 c2 c1 P5:4=P5P4 G6:5=G6+P6G5 G7:6=G7+P7G6 G6:0=G6+P6G5:4+P6P5:4G4:0 G6:4=G6+P6G5+P6P5G4 P5:4=P5P4
  • 32. Summary (1/3)  The parallel prefix formulation of binary addition is a very convenient way to formally describe an entire family of parallel binary adders.
  • 33. Summary (2/3)  A parallel prefix adder can be seen as a 3-stage process: Pre-calculation of Pi, Gi terms Calculation of the carries. Simple adder to generate the sum  There exist various architectures for the carry calculation part.  Trade-offs in these architectures involve the  area of the adder  its depth  the fan-out of the nodes  the overall wiring network.
  • 34. Summary (3/3)  Variations of parallel adders have been proposed. These variations are based on:  Modifying the carry generation equations and reformulating the prefix definition (Ling)  Restructuring the carry calculation trees based by optimizing for a specific technology (Beaumond- Smith)  Other optimizations.
  • 35. References: Beaumont-Smith, Cheng-Chew Lim, “Parallel Prefix Adder Design”, IEEE, 2001 Han, Carlson, “Fast Area-Efficient VLSI Adders, IEEE, 1987 Dimitrakopoulos, Nikolos, “High-Speed Parallel-Prefix VLSI Ling Adders”, IEEE 2005 Kogge, Stone, “A Parallel Algorithm for the Efficient solution of a General Class of Recurrence equations”, IEEE, 1973 Simon Knowles, “A Family of adders”, IEEE, 2001 Ladner, Fischer, “Parallel Prefix Computation”, ACM, 1980 Brent, Kung, “A regular Layout for Parallel Adders”, IEEE, 1982 H. Ling, “High-Speed Binary Adder”, IBM J. Res. And Dev., 1980 J. Sklansky, “Conditional-Sum Addition Logic”, IRE transactions on computers, 1960 D. Harris, “A Taxonomy of Parallel Prefix Networks”, IEEE, 2003