SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
CSBP: A Fast Circuit Similarity-Based
          Placement for FPGA Incremental Design
               and Design Space Exploration

      1Xiaoyu   Shi, 1Dahua Zeng, 2Yu Hu, 1Guohui Lin, 1Osmar R. Zaiane

        1Dept.  of Computing Science, University of Alberta
2Dept. of Electrical and Computer Engineering, University of Alberta

                           Presented by Xiaoyu Shi




                                  LOGO

                Please address comments to bryanhu@ece.ualberta.ca
Outline


      Introduction

      Circuit Similarity-Based Placement


      Experimental Results


      Conclusion and Future Work
Introduction
 Field Programmable Gate Array (FPGA)
    Ease of design, low start-up costs and fast manufacturing
     turnaround time.
    Size of FPGAs has reached million gates level.
    Modern FPGA designs suffer from long compilation time.


                                                                  Xilinx SPARTAN-6 board
 FPGA placement
    Determines which logic block within an FPGA should implement each of the
     logic blocks required by the circuits.
    Has a significant impact on the performance and routability in nanometer
     circuit designs.
    The optimization goals are to minimize certain criteria, such as wire length,
     critical delay and area.
    Now becomes the bottleneck of modern FPGA circuit design [Chen’06].
 Up-to-date fast placement algorithms
    Extensive studies have been performed to improve the placement efficiency
     as a single synthesis phase for decades.
    State-of-the-art work includes using multi-core [Ludwin’08], embedding-
     based [Gopalakrishnan’06], partitioning-based [Maidee’05], multi-level
     [Sankar’99], simulated annealing [Betz’97].
Reusable Info in CAD
 Incremental design for FPGAs
    Design preservation is the key of incremental design.
    Similarity among circuits exists because functional changes or optimizations
     are small, and they generally result in a similar topology of the modified
     circuit compared to the original circuit [Krishnaswamy’09].


                                                  Final design
           Final iteration
                                            Optimizations, timing,
           Iteration 3 …                             etc …
                                      Changes due to
           Iteration 2             verification, timing, etc
                             Initial design
           Iteration 1

                   Incremental design process for FPGAs
Reusable Info in CAD (Cont.)
 Design space exploration for FPGAs
    FPGA design offers a variety of customizations by varying design
     parameters.
    Local similarity and global similarity exist in design space exploration.




                                                  Final design

                                            Optimizations, timing,
                                                     etc …
                                      Changes due to
                                   verification, timing, etc
                             Initial design
               Constant multiplier blocks by CMU SPIRAL [Puschel’04]
Data Mining
 Overview
    The key of data mining is to extract patterns and useful information from
     data, including text, graphs and circuits, etc.
    It has been extensively studied since 1950s, and has been widely applied to
     many domains, such as businesses, sciences and health cares.
    Graph mining, including graph pattern mining, graph classification and graph
     compression, is a research hot area in data mining [Borgwardt’08].
 Graph similarity
    It quantitatively defines the topological similarity between two graphs.
    It has been used to many applications, such as web searching
     [Kleinberg’99], social network mapping [Watts’99] and chemical structure
     matching [Hattori’03].
Graph Similarity
 Summary of graph similarity measures
       Measure                        Description                    Time        Global
                                                                   Complexity    Topo
 Isomorphism           Identifying a bijection between the nodes   NP-Hard      Yes
 [Pelillo’02]          of two graphs which preserves (directed)
                       adjacency
 Edit distance         Given a cost function on edit operations,   NP-Hard      Yes
 [Bunke’99]            determine the minimum cost
                       transformation from one graph to another
 Common subgraph       Identifying the largest isomorphic          NP-Hard      Yes
 [Fernandez’01]        subgraphs of two graphs
 Iterative methods     Two graph elements are similar if their     Cubic        Yes
 [Blondel’04]          neighborhoods are similar
 Statistical methods   Assessing aggregate measures of graph       Linear       No
 [Alberta’02]          structure, degree distribution, diameter,
                       betweenness measures


 Iterative methods
      It has lower computational complexity and considers global topological
       information.
      It takes advantage of the graph sparsity.
Circuit Similarity
 Circuit similarity
     We define circuit similarity to describe the similar topological structures
      between two circuits.
     We adapt the iterative methods in graph similarity.
     It exists in several CAD phases, such as placement, routing and verification.
     It can be widely used to accelerate FPGA designs, such as incremental
      design and exploration of the design space, etc.
Outline


      Introduction


      Circuit Similarity-Based Placement

      Experimental Results


      Conclusion and Future Work
Motivating Example
              Circuit similarity algorithm
       V7     V8     V9     V10    V11    V12    V13    V14    V15    V16


V’7
       0.92 0.25 0.48       0.15    0      0      0     0.42   0.06    0
V’8
        0     0.73    0      0     0.05    0     0.39    0     0.17   0.06
V’9
        0     0.39    0      0     0.4     0     0.73    0     0.06   0.48
V’10
                                                                             Graph G
       0.48    0     0.89   0.25   0.3    0.12   0.14   0.06   0.33   0.09
V’11
        0      0     0.11   0.48    0     0.86    0     0.36   0.17    0
V’12
        0      0     0.3    0.34   0.64   0.25   0.39   0.34   0.15   0.42
V’13
       0.48 0.25 0.07       0.4     0     0.36    0     0.88   0.06    0
V’14
       0.4    0.39 0.29     0.15   0.15   0.18   0.12   0.46   0.59   0.06
V’15
        0     0.12 0.09      0     0.63    0     0.36    0     0.27   0.82


                   Similarity score matrix for G and G’
                                                                             Graph G’
Motivating Example (Cont.)
 Circuit similarity-based
  placement
     The initial placement of the new
      circuit design (G’) is generated by
      computing the similarity between
      the original (G) and modified
      circuits, and finding the
      correspondent node matching.
     A low-temperature simulated
      annealing is applied to further
      refine the results.
     The proposed circuit similarity
      algorithm can be used to speedup
      placement, which allows faster
      incremental design and design
      space exploration.
Motivating Example (Cont.)




(a) Placement of     (b) Init placement      (c) Final placement        (d) init placement        (c) Final placement
reference config          using CS                 using CS                 using VPR                  using VPR
                                     Placement layouts comparison of circuit “des”


       A real example                                                  Wire     Delay       Critical     Runtime
                                                                                 (E-05)      Delay        (s)
             For circuit “des”, the reference                                               (E-08)
              configuration (synthesized using
              “resyn3” script in ABC) has 1245            CS-init        306      5.93           -             -
              CLBs and 1501 nets while the
              new configuration (synthesized              VPR-init      1087      14.00          -             -
              using “rwsat2” script in ABC) has
              1215 CLBs and 1471 nets.                    CS-final       237      5.08          8.28        13.38
             The results show that CSBP
              successfully finds the internal             VPR-final      221      4.98         10.10        28.42
              node correspondence.

                                                                   Status of placement results of circuit “des”
Circuit Similarity CAD Flow




CAD flow for incremental design   CAD flow for design space exploration
Circuit Similarity Algorithm
 Iterative similarity algorithm
     We employ the iterative similarity
      algorithm for undirected molecular
      graphs [Rupp’07].
     We adapt the iterative similarity
      algorithm to consider directed
      circuit graphs, fix the I/O pins, and
      compute the similarity of fanin
      and fanout nodes respectively,
      based on unique circuit
      constraints.



                                If (|in(vi)| < |in(v’j)| and |out(vi)| < |out(v’j)|)




                                                                    Summary of variables
Performance Enhancement
 Support constraint
    A support of a node is the set of
     nodes with predefined matchings


    Formally, if v ∈ G and v’ ∈ G’, the
     in the transitive fanin or fanout
     cone of this node.

     support constraint requires:


      where β ∈ (0,1].
 Level constraint
    A topological sort and reverse


    Formally, if v ∈ G and v’ ∈ G’, the
     topological sort can label each
     internal node with two values.

     level constraint requires:

      where Bl and Br are two
      nonnegative integers.



                                           Effectiveness of the pruning techniques
Outline


      Introduction


      Circuit Similarity-Based Placement


      Experimental Results

      Conclusion and Future Work
Incremental Design
                                           f
 CAD flow
    Two-iteration CAD flow.
    CSBP flow (a) and from-scratch
     flow (b) are compared.
    Optimization “imfs” reduces the
     number of CLBs by 2%.
 Settings
    Two versions of CSBP are
     compared: A high quality version
     (CS) with β = 0.5, inner_num = 1
     and Bl = Br = 1; A turbo version
     (CS-t) with β = 1, inner_num = 0.1
     and Bl = Br = 0.
    CSBP is implemented in C and
     evaluated on the 20 largest
     MCNC benchmarks.
    The results are averaged over 5
     funs on a Linux server with dual-
     core 2.19GHz CPU and 5GB
     memory.
    CS2 package [Goldberg’97] is
     used for maximum matching
     problem.                                   CAD flow for incremental design
Results
                    Initial placement results
                        Bounding box cost (bb cost) and delay cost are compared.
                        Clearly, the initial placement results generated using CS is much better than
                         VPR’s initial results, and is very close to VPR’s final results.

     100%                                                                           100%
             90%                                                                    90%
             80%                                                                    80%
Percentage




                                                                       Percentage
             70%                                                                    70%
             60%                                                                    60%
             50%                                                                    50%
             40%                                                                    40%
             30%                                                                    30%
             20%                                                                    20%
             10%                                                                    10%
             0%                                                                      0%
                   s38417
                   s38584




                                                                                           s38417
                                                                                           s38584
                      s298




                                                                                              s298
                        pdc




                                                                                               alu4




                                                                                           ex1010




                                                                                                pdc
                       alu4
                    apex2
                    apex4




                   ex1010




                     tseng




                                                                                            apex2
                                                                                            apex4




                                                                                             tseng
                      ex5p
                       frisc




                                                                                              ex5p
                        seq




                                                                                                des




                                                                                               frisc
                        des




                                                                                                seq
                     diffeq




                   misex3




                       spla




                                                                                            bigkey
                                                                                              clma

                                                                                             diffeq
                                                                                               dsip




                                                                                           misex3




                                                                                               spla
                    bigkey
                      clma



                       dsip
                    elliptic




                                                                                            elliptic
                            CS-init   VPR-final   VPR-init                                        CS-init   VPR-final   VPR-init


                         Comparisons of initial bb cost                                        Comparisons of initial delay cost


                   CS reduces bb cost by 72% on avg. compared to VPR                 CS reduces delay cost by 53% on avg. compared to VPR
Results (Cont.)
                                                    300000
 Post-routing results comparison
                                                    250000
      A low-temperature annealing is               200000
       applied to the initial results.
                                                    150000
      Wire length, critical delay and area
       are compared.                                100000

      The results demonstrate the                  50000
       effectiveness of the pruning                     0
       techniques, which do not affect the




                                                              apex2
                                                              apex4




                                                             ex1010




                                                               tseng
                                                                ex5p




                                                             s38417
                                                             s38584
                                                                  seq
                                                              bigkey

                                                                  des
                                                                clma

                                                               diffeq
                                                                 dsip




                                                             misex3

                                                                s298




                                                                 spla
                                                                 alu4




                                                                  pdc
                                                                 frisc
                                                              elliptic
       quality significantly.

                                                                             CS-t   CS   VPR     Wire length
                                                             CS increases the wire length by 3% on avg.
 4.00E+08
                                                     4.50E-07
 3.50E+08                                            4.00E-07
 3.00E+08                                            3.50E-07
 2.50E+08                                            3.00E-07
 2.00E+08                                            2.50E-07
 1.50E+08                                            2.00E-07
 1.00E+08                                            1.50E-07
                                                     1.00E-07
 5.00E+07
                                                     5.00E-08
 0.00E+00
                                                    0.00E+00
            s38417
            s38584
               s298
                 pdc
                alu4
             apex2
             apex4




            ex1010




              tseng
                 des




               ex5p
                frisc




                 seq
             bigkey
               clma

              diffeq
                dsip




            misex3




                spla
             elliptic




                                                                 s38417
                                                                 s38584
                                                                    s298
                                                                      pdc
                                                                     alu4
                                                                  apex2
                                                                  apex4




                                                                 ex1010




                                                                   tseng
                                                                      des




                                                                    ex5p
                                                                     frisc




                                                                      seq
                                                                  bigkey
                                                                    clma

                                                                   diffeq
                                                                     dsip




                                                                 misex3




                                                                     spla
                                                                  elliptic
                      CS-t   CS   VPR        Area
                                                                             CS-t   CS    VPR       Critical delay
            CS increases the area by 2% on avg.                 CS increases the crit. delay by 6% on avg.
Results (Cont.)
 Runtime comparison
             Only placement time is compared.
             CS-t achieves 31x speedup on average, with up to 91x.
             More speedup is expected when circuits become larger.

           100
           90
           80
           70
Speedups




           60
           50
           40
           30
           20
           10
            0




                                         CS-t   CS   VPR


                                    Speedups compared to VPR
Design Space Exploration
             CAD flow
                    Study logic-level and algorithm-
                     level design space, respectively.
                    CSBP flow (a) and from-scratch
                     flow (b) are compared.
             Settings
                    The logic-level design space
                     consists of 19 configurations
                     generated by 19 ABC1 synthesis
                     scripts in abc.rc.
                    The algorithm-level design space
                     consists of 18 configurations of
                     constant multiplier generated by
                     CMU SPIRAL [Puschel’04]
                     varying bits from 7 to 252.
                    Both CS and CS-t are evaluated.
                    The benchmarking environments
                     are the same as logic-level design
                     space exploration.



1   http://www.eecs.berkeley.edu/~alanmi/abc/
2
                                                          CAD flow for design space exploration
    Bit = 16 is abandoned due to ABC crash
Logic-level Sample Synthesis Scripts
       Alias          Scripts
       resyn          "b; rw; rwz; b; rwz; b"

       resyn2         "b; rw; rf; b; rw; rwz; b; rfz; rwz; b"
       resyn2a        "b; rw; b; rw; rwz; b; rwz; b"

       src_rw         "st; rw -l; rwz -l; rwz -l"

       src_rs         "st; rs -K 6 -N 2 -l; rs -K 9 -N 2 -l; rs -K 12 -N 2 -l"

       choice         "fraig_store; resyn; fraig_store; resyn2; fraig_store; fraig_restore"
       rwsat          "st; rw -l; b -l; rw -l; rf -l"

       compress "b -l; rw -l; rwz -l; b -l; rwz -l; b -l"
       share          "st; multi -m; fx; resyn2"




http://www.eecs.berkeley.edu/~alanmi/abc/
Logic Level Results
                                                     2500

 Initial results comparison                         2000

     The number of CLBs and levels vary             1500
      widely in logic-level design space.            1000
     Show circuit “dsip” as an example.              500
     Bounding box cost and delay cost are
                                                        0
      compared for initial placement




                                                                                                                                                             shake
                                                                                                                                                    rwsat2


                                                                                                                                                                        share




                                                                                                                                                                                                                                       resyn2rsdc
                                                                               resyn2a




                                                                                                                         choice




                                                                                                                                                                                                                                                    compress2rsdc
                                                                      resyn2


                                                                                         resyn3




                                                                                                                                  choice2
                                                                                                                                            rwsat




                                                                                                                                                                                           src_rs
                                                                                                             compress2




                                                                                                                                                                                 src_rw


                                                                                                                                                                                                    src_rws
                                                                                                                                                                                                              resyn2rs
                                                              resyn




                                                                                                  compress




                                                                                                                                                                                                                         compress2rs
      results.


                                                      CS              CS-t                        VPR
                                                                                              Initial bb cost of “dsip”
                                                                                  CS reduces bb cost by 76% on avg.
                                                   4.00E-04

                                 Critical delay    3.00E-04

                                                   2.00E-04

                                                   1.00E-04

                                                   0.00E+00




                                                                                                                                                                                                                                                    compress2rs…
                                                                                  resyn2a
                                                                         resyn2


                                                                                            resyn3


                                                                                                                compress2




                                                                                                                                                                shake




                                                                                                                                                                                                    src_rws
                                                                                                                                                                                                              resyn2rs
                                                                resyn




                                                                                                     compress




                                                                                                                                                       rwsat2


                                                                                                                                                                         share




                                                                                                                                                                                                                         compress2rs
                                                                                                                                                                                                                                       resyn2rsdc
                                                                                                                            choice
                                                                                                                                     choice2
                                                                                                                                               rwsat




                                                                                                                                                                                           src_rs
                                                                                                                                                                                  src_rw
                                                       CS     CS-t                VPR                                        Initial delay cost of “dsip”

                                                                                CS reduces delay cost by 48% on avg.
     Characteristics of logic-level design space
Logic Level Results (Cont.)
                Final placement results
                        Wire length and critical delay of circuit “dsip” are compared.
                        The final results produced by CS and CS-t are very close or better
                         compared to VPR’s, with 32% overhead for wire length and 20%
                         improvement for critical delay.


             100%
                                                                   100%

             80%                                                          80%
Percentage




                                                             Percentage
             60%                                                          60%

             40%                                                          40%

             20%                                                          20%


              0%                                                          0%




                                                                                       resyn2a
                                                                                         resyn2

                                                                                         resyn3

                                                                                    compress2




                                                                                          shake




                                                                                       src_rws
                                                                                      resyn2rs
                                                                                           resyn




                                                                                     compress




                                                                                         rwsat2

                                                                                           share




                                                                                  compress2rs
                                                                                    resyn2rsdc
                                                                                          choice




                                                                                compress2rsdc
                                                                                        choice2
                                                                                           rwsat




                                                                                          src_rs
                                                                                         src_rw
                           resyn2a
                             resyn2

                             resyn3

                        compress2




                              shake




                           src_rws
                          resyn2rs
                               resyn




                         compress




                             rwsat2

                               share




                      compress2rs
                        resyn2rsdc
                              choice




                    compress2rsdc
                            choice2
                               rwsat




                              src_rs
                             src_rw




                                     CS-t   CS   VPR                                            CS-t   CS   VPR


                    Final wire length comparison of “dsip”                       Final critical delay comparison of “dsip”
Logic Level Results (Cont.)
                                                                                                                                                                        800
                                                                                                                                                                        700
 Design space shape characterization                                                                                                                                   600
        We compare the minimal, median and                                                                                                                             500
         maximal wire length and critical delay                                                                                                                         400
         produced by CS, CS-t and VPR.                                                                                                                                  300
                                                                                                                                                                        200
        We also compare the shapes of each
         configuration over 19 designs.                                                                                                                                 100
                                                                                                                                                                          0
        The almost identical curves show that




                                                                                                                                                                                    compress2…
                                                                                                                                                                                          shake
                                                                                                                                                                                         rwsat2

                                                                                                                                                                                           share




                                                                                                                                                                                    resyn2rsdc
                                                                                                                                                                                       resyn2a




                                                                                                                                                                                          choice
                                                                                                                                                                                         resyn2

                                                                                                                                                                                         resyn3




                                                                                                                                                                                        choice2
                                                                                                                                                                                           rwsat




                                                                                                                                                                                          src_rs
                                                                                                                                                                                    compress2




                                                                                                                                                                                         src_rw

                                                                                                                                                                                       src_rws
                                                                                                                                                                                      resyn2rs
                                                                                                                                                                                           resyn




                                                                                                                                                                                     compress




                                                                                                                                                                                  compress2rs
         CSBP is able to accurately depict the
         shape of a design space.
                                                                                                                                                                                  vpr         cs          cs-t
                                                                                                                                                                                    Shape of final wire length of circuit “dsip”
2500
                                                                                                                                                                      4.5E-07
                                                                                                                                                                    0.0000004
2000
                                                                                                                                                                      3.5E-07
                                                                                                                                                                    0.0000003
1500
                                                                                                                                                                      2.5E-07
                                                                                                                                                                    0.0000002
1000
                                                                                                                                                                      1.5E-07

 500                                                                                                                                                                0.0000001
                                                                                                                                                                        5E-08
   0                                                                                                                                                                          0




                                                                                                                                                                                  ex1010
                                                                                                                                                                                   apex2
                                                                                                                                                                                   apex4




                                                                                                                                                                                    tseng
                                                                                                                                                                                       des




                                                                                                                                                                                     ex5p




                                                                                                                                                                                  s38417
                                                                                                                                                                                  s38584
                                                                                                                                                                                   bigkey
                                                                                                                                                                                     clma

                                                                                                                                                                                    diffeq
                                                                                                                                                                                      dsip




                                                                                                                                                                                  misex3

                                                                                                                                                                                     s298



                                                                                                                                                                                       seq
                                                                                                                                                                                      spla
                                                                                                                                                                                       pdc
                                                                                                                                                                                      alu4




                                                                                                                                                                                      frisc
                                                                                                                                                                                   elliptic
                                                                                                                             s38417
                                                                                                                                      s38584
                                                                                                                      s298
       alu4
              apex2
                      apex4




                                                                               ex1010




                                                                                                                pdc




                                                                                                                                                            tseng
                              bigkey


                                              des




                                                                                        ex5p
                                                                                               frisc




                                                                                                                                               seq
                                                                                                                                                     spla
                                       clma


                                                    diffeq
                                                             dsip




                                                                                                       misex3
                                                                    elliptic




              vpr-min                                 cs-min                                    cs-t-min                                                                          vpr-min          cs-min           cs-t-min
  Shape of minimal wire length of 20 circuits over 19 designs                                                                                                         Shape of minimal crit. delay of 20 circuits over 19 designs
Logic Level Results (Cont.)
                   Runtime comparison
                                    Only placement time is compared.
                                    CS-t achieves 30x speedup on
                                     average, with up to 100x.
                                    In practice, one can take
                                     advantage of the significant
                                     speedup of CS-t to perform quick
                                     design space exploration.
    100
           90
           80
           70
Speedups




           60
           50
           40
           30
           20
           10
           0
                                                                                                                                      s38417
                                                                                                                                               s38584
                                                                                                                               s298
                                                                                                                         pdc
                alu4
                       apex2
                               apex4




                                                                                                                                                                     tseng
                                                                                        ex1010


                                                                                                        frisc
                                                       des




                                                                                                 ex5p




                                                                                                                                                        seq
                                                                                                                                                              spla
                                       bigkey
                                                clma


                                                             diffeq




                                                                                                                misex3
                                                                      dsip
                                                                             elliptic




                                                                      CS            CS-t                VPR
                                                                                                                                                                                         Runtime comparison
                                                Speedups compared to VPR                                                                                                     (“*” marked time is measured with a timeout )
Algorithm Level Results
     Experimental settings
           The algorithm-level design is a
            constant multiplier.
           The design parameter explored in our
            experiments is the fractional bits
            varying from 7 to 251.
           CMU SPIRAL is used to generate
            RTL design based on Hcub algorithm
            [Voronenko’07].                            Characteristics of algorithm-level design
                                                          space generated by CMU SPIRAL
     Experimental results
           The initial and final placement results
            are similar to logic-level space
            exploration.
           CS and CS-t achieve 7x and 30x
            speedup compared VPR,
            respectively.




                                                      An example of a constant parallel multiplier
1   Bit = 16 is abandoned due to ABC crash
Algorithm Level Results (Cont.)
                                 Wire length-delay space comparison
                                           The pareto-points, which are the optimal configurations in a design space,
                                            are of most interests to IC designers.
                                           CS and VPR find the same pareto-points.
                                           Bits = 24 is used as the reference circuit.




                           4.00E-07                                                                                  4.25E-07
Estimated critical delay




                                                                                          Estimated critical delay
                           3.50E-07                                 B19             B25                              3.75E-07                                                 B25
                                                                                                                                                            B19
                                                                      B18                                                                                         B18
                           3.00E-07                                               B23                                3.25E-07                                           B23
                                                                            B22
                                                                    B17                                                                                                 B22
                                                                                                                                                            B21 B17
                           2.50E-07                    B14           B21                                             2.75E-07              B14

                                                B12                                                                                                   B15
                                                             B15                                                                         B12
                           2.00E-07                                                                                  2.25E-07
                                           B8
                                                                                                                                    B7         B10
                                                  B10
                           1.50E-07             B9                                                                   1.75E-07        B8 B9
                                           B7
                                      0          100         200    300     400           500                                   0              200                 400              600

                                                             Wire length                                                                             Wire length

                                                Wire length-delay space of VPR                                                           Wire length-delay space of CS
Outline


      Introduction


      Circuit Similarity-Based Placement


      Experimental Results


      Conclusion and Future Work
Future Work
 Improvement to CSBP
    Integrate predefined matchings, for example, naming matching, into our
     CSBP to further enhance both the efficiency and the quality of the design.
 Other applications
    Study the effectiveness of applying circuit similarity algorithm to other
     applications, such as routing and sequential verification for FPGAs
Conclusion
 Proposed an efficient circuit similarity algorithm
 Developed CSBP, a fast circuit similarity-based placement for
  FPGAs
     Applied CSPB to incremental design and design space exploration.
     Open-source tool available at:
      http://webdocs.cs.ualberta.ca/~xshi/soft.html
 Applied CSBP to incremental design for FPGAs
     CSBP is able to reduce engineering effort by capturing the similarity from the
      previous design iterations.
     CSBP is 31x faster compared to VPR.
 Applied CSBP to design space exploration for FPGAs
     CSBP can precisely depict the shape of a design space and pinpoint the
      optimal designs.
     CSBP is 30x faster compared to VPR.
Xiaoyu Shi, Dahua Zeng, Yu Hu, Guohui Lin, Osmar R. Zaiane

 CSBP: A Fast Circuit Similarity-Based Placement for FPGA
    Incremental Design and Design Space Exploration




                       LOGO
          www.themegallery.com

Más contenido relacionado

La actualidad más candente

Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDavid Gleich
 
Optimization of distributed generation of renewable energy sources by intelli...
Optimization of distributed generation of renewable energy sources by intelli...Optimization of distributed generation of renewable energy sources by intelli...
Optimization of distributed generation of renewable energy sources by intelli...Beniamino Murgante
 
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...Editor IJMTER
 
vasp-gpu on Balena: Usage and Some Benchmarks
vasp-gpu on Balena: Usage and Some Benchmarksvasp-gpu on Balena: Usage and Some Benchmarks
vasp-gpu on Balena: Usage and Some BenchmarksJonathan Skelton
 
A Novel CAZAC Sequence Based Timing Synchronization Scheme for OFDM System
A Novel CAZAC Sequence Based Timing Synchronization Scheme for OFDM SystemA Novel CAZAC Sequence Based Timing Synchronization Scheme for OFDM System
A Novel CAZAC Sequence Based Timing Synchronization Scheme for OFDM SystemIJAAS Team
 
SPU Optimizations - Part 2
SPU Optimizations - Part 2SPU Optimizations - Part 2
SPU Optimizations - Part 2Naughty Dog
 
On Data Mining in Inverse Scattering Problems: Neural Networks Applied to GPR...
On Data Mining in Inverse Scattering Problems: Neural Networks Applied to GPR...On Data Mining in Inverse Scattering Problems: Neural Networks Applied to GPR...
On Data Mining in Inverse Scattering Problems: Neural Networks Applied to GPR...IDES Editor
 
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01Hemant Jha
 
MULTI-POINT DESIGN OF A SUPERSONIC WING USING MODIFIED PARSEC AIRFOIL REPRESE...
MULTI-POINT DESIGN OF A SUPERSONIC WING USING MODIFIED PARSEC AIRFOIL REPRESE...MULTI-POINT DESIGN OF A SUPERSONIC WING USING MODIFIED PARSEC AIRFOIL REPRESE...
MULTI-POINT DESIGN OF A SUPERSONIC WING USING MODIFIED PARSEC AIRFOIL REPRESE...Masahiro Kanazaki
 
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDADynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDAIJERA Editor
 
A Power Efficient Architecture for 2-D Discrete Wavelet Transform
A Power Efficient Architecture for 2-D Discrete Wavelet TransformA Power Efficient Architecture for 2-D Discrete Wavelet Transform
A Power Efficient Architecture for 2-D Discrete Wavelet TransformRahul Jain
 

La actualidad más candente (14)

Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architectures
 
Optimization of distributed generation of renewable energy sources by intelli...
Optimization of distributed generation of renewable energy sources by intelli...Optimization of distributed generation of renewable energy sources by intelli...
Optimization of distributed generation of renewable energy sources by intelli...
 
Hz2514321439
Hz2514321439Hz2514321439
Hz2514321439
 
10.1.1.2.9988
10.1.1.2.998810.1.1.2.9988
10.1.1.2.9988
 
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
 
vasp-gpu on Balena: Usage and Some Benchmarks
vasp-gpu on Balena: Usage and Some Benchmarksvasp-gpu on Balena: Usage and Some Benchmarks
vasp-gpu on Balena: Usage and Some Benchmarks
 
A Novel CAZAC Sequence Based Timing Synchronization Scheme for OFDM System
A Novel CAZAC Sequence Based Timing Synchronization Scheme for OFDM SystemA Novel CAZAC Sequence Based Timing Synchronization Scheme for OFDM System
A Novel CAZAC Sequence Based Timing Synchronization Scheme for OFDM System
 
SPU Optimizations - Part 2
SPU Optimizations - Part 2SPU Optimizations - Part 2
SPU Optimizations - Part 2
 
On Data Mining in Inverse Scattering Problems: Neural Networks Applied to GPR...
On Data Mining in Inverse Scattering Problems: Neural Networks Applied to GPR...On Data Mining in Inverse Scattering Problems: Neural Networks Applied to GPR...
On Data Mining in Inverse Scattering Problems: Neural Networks Applied to GPR...
 
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
 
MULTI-POINT DESIGN OF A SUPERSONIC WING USING MODIFIED PARSEC AIRFOIL REPRESE...
MULTI-POINT DESIGN OF A SUPERSONIC WING USING MODIFIED PARSEC AIRFOIL REPRESE...MULTI-POINT DESIGN OF A SUPERSONIC WING USING MODIFIED PARSEC AIRFOIL REPRESE...
MULTI-POINT DESIGN OF A SUPERSONIC WING USING MODIFIED PARSEC AIRFOIL REPRESE...
 
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDADynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
 
MTP paper
MTP paperMTP paper
MTP paper
 
A Power Efficient Architecture for 2-D Discrete Wavelet Transform
A Power Efficient Architecture for 2-D Discrete Wavelet TransformA Power Efficient Architecture for 2-D Discrete Wavelet Transform
A Power Efficient Architecture for 2-D Discrete Wavelet Transform
 

Destacado

Gradient-Based Multi-Objective Optimization Technology
Gradient-Based Multi-Objective Optimization TechnologyGradient-Based Multi-Objective Optimization Technology
Gradient-Based Multi-Objective Optimization TechnologyeArtius, Inc.
 
The Multi-Objective Genetic Algorithm Based Techniques for Intrusion Detection
The Multi-Objective Genetic Algorithm Based Techniques for Intrusion DetectionThe Multi-Objective Genetic Algorithm Based Techniques for Intrusion Detection
The Multi-Objective Genetic Algorithm Based Techniques for Intrusion Detectionijcsse
 
Gary Yen: "Multi-objective Optimization and Performance Metrics Ensemble"
Gary Yen: "Multi-objective Optimization and Performance Metrics Ensemble" Gary Yen: "Multi-objective Optimization and Performance Metrics Ensemble"
Gary Yen: "Multi-objective Optimization and Performance Metrics Ensemble" ieee_cis_cyprus
 
Multi-Objective Optimization in Rule-based Design Space Exploration (ASE 2014)
Multi-Objective Optimization in Rule-based Design Space Exploration (ASE 2014)Multi-Objective Optimization in Rule-based Design Space Exploration (ASE 2014)
Multi-Objective Optimization in Rule-based Design Space Exploration (ASE 2014)hani_abdeen
 
Multi-Objective Evolutionary Algorithms
Multi-Objective Evolutionary AlgorithmsMulti-Objective Evolutionary Algorithms
Multi-Objective Evolutionary AlgorithmsSong Gao
 
Method of solving multi objective optimization problem in the presence of unc...
Method of solving multi objective optimization problem in the presence of unc...Method of solving multi objective optimization problem in the presence of unc...
Method of solving multi objective optimization problem in the presence of unc...eSAT Journals
 
Multi objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions resultMulti objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions resultPiyush Agarwal
 
Cyber infrastructure in engineering design
Cyber infrastructure in engineering designCyber infrastructure in engineering design
Cyber infrastructure in engineering designAmogh Mundhekar
 
Pareto optimal
Pareto optimal    Pareto optimal
Pareto optimal rmpas
 
Multiobjective optimization and trade offs using pareto optimality
Multiobjective optimization and trade offs using pareto optimalityMultiobjective optimization and trade offs using pareto optimality
Multiobjective optimization and trade offs using pareto optimalityAmogh Mundhekar
 
Multi Objective Optimization
Multi Objective OptimizationMulti Objective Optimization
Multi Objective OptimizationNawroz University
 

Destacado (11)

Gradient-Based Multi-Objective Optimization Technology
Gradient-Based Multi-Objective Optimization TechnologyGradient-Based Multi-Objective Optimization Technology
Gradient-Based Multi-Objective Optimization Technology
 
The Multi-Objective Genetic Algorithm Based Techniques for Intrusion Detection
The Multi-Objective Genetic Algorithm Based Techniques for Intrusion DetectionThe Multi-Objective Genetic Algorithm Based Techniques for Intrusion Detection
The Multi-Objective Genetic Algorithm Based Techniques for Intrusion Detection
 
Gary Yen: "Multi-objective Optimization and Performance Metrics Ensemble"
Gary Yen: "Multi-objective Optimization and Performance Metrics Ensemble" Gary Yen: "Multi-objective Optimization and Performance Metrics Ensemble"
Gary Yen: "Multi-objective Optimization and Performance Metrics Ensemble"
 
Multi-Objective Optimization in Rule-based Design Space Exploration (ASE 2014)
Multi-Objective Optimization in Rule-based Design Space Exploration (ASE 2014)Multi-Objective Optimization in Rule-based Design Space Exploration (ASE 2014)
Multi-Objective Optimization in Rule-based Design Space Exploration (ASE 2014)
 
Multi-Objective Evolutionary Algorithms
Multi-Objective Evolutionary AlgorithmsMulti-Objective Evolutionary Algorithms
Multi-Objective Evolutionary Algorithms
 
Method of solving multi objective optimization problem in the presence of unc...
Method of solving multi objective optimization problem in the presence of unc...Method of solving multi objective optimization problem in the presence of unc...
Method of solving multi objective optimization problem in the presence of unc...
 
Multi objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions resultMulti objective optimization and Benchmark functions result
Multi objective optimization and Benchmark functions result
 
Cyber infrastructure in engineering design
Cyber infrastructure in engineering designCyber infrastructure in engineering design
Cyber infrastructure in engineering design
 
Pareto optimal
Pareto optimal    Pareto optimal
Pareto optimal
 
Multiobjective optimization and trade offs using pareto optimality
Multiobjective optimization and trade offs using pareto optimalityMultiobjective optimization and trade offs using pareto optimality
Multiobjective optimization and trade offs using pareto optimality
 
Multi Objective Optimization
Multi Objective OptimizationMulti Objective Optimization
Multi Objective Optimization
 

Similar a CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tr...
Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tr...Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tr...
Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tr...IJMER
 
SEMINAR[2].pptx automatic circuit design
SEMINAR[2].pptx automatic circuit designSEMINAR[2].pptx automatic circuit design
SEMINAR[2].pptx automatic circuit designShaelMalik
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAVLSICS Design
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAVLSICS Design
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAVLSICS Design
 
Implementation of Non-restoring Reversible Divider Using a Quantum-Dot Cellul...
Implementation of Non-restoring Reversible Divider Using a Quantum-Dot Cellul...Implementation of Non-restoring Reversible Divider Using a Quantum-Dot Cellul...
Implementation of Non-restoring Reversible Divider Using a Quantum-Dot Cellul...VIT-AP University
 
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...IJERA Editor
 
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...IJERA Editor
 
An Improved Optimization Techniques for Parallel Prefix Adder using FPGA
An Improved Optimization Techniques for Parallel Prefix Adder using FPGAAn Improved Optimization Techniques for Parallel Prefix Adder using FPGA
An Improved Optimization Techniques for Parallel Prefix Adder using FPGAIJMER
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphspione30
 
Modification on Energy Efficient Design of DVB-T2 Constellation De-mapper
Modification on Energy Efficient Design of DVB-T2 Constellation De-mapperModification on Energy Efficient Design of DVB-T2 Constellation De-mapper
Modification on Energy Efficient Design of DVB-T2 Constellation De-mapperIJERA Editor
 
Modification on Energy Efficient Design of DVB-T2 Constellation De-mapper
Modification on Energy Efficient Design of DVB-T2 Constellation De-mapperModification on Energy Efficient Design of DVB-T2 Constellation De-mapper
Modification on Energy Efficient Design of DVB-T2 Constellation De-mapperIJERA Editor
 
Kernel Descriptors for Visual Recognition
Kernel Descriptors for Visual RecognitionKernel Descriptors for Visual Recognition
Kernel Descriptors for Visual RecognitionPriyatham Bollimpalli
 
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic Synthesis
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic SynthesisMinimum Cost Fault Tolerant Adder Circuits in Reversible Logic Synthesis
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic SynthesisSajib Mitra
 
Implementation and Comparison of Efficient 16-Bit SQRT CSLA Using Parity Pres...
Implementation and Comparison of Efficient 16-Bit SQRT CSLA Using Parity Pres...Implementation and Comparison of Efficient 16-Bit SQRT CSLA Using Parity Pres...
Implementation and Comparison of Efficient 16-Bit SQRT CSLA Using Parity Pres...IJERA Editor
 
A Novel and Efficient Design for Squaring Units by Quantum-Dot Cellular Automata
A Novel and Efficient Design for Squaring Units by Quantum-Dot Cellular AutomataA Novel and Efficient Design for Squaring Units by Quantum-Dot Cellular Automata
A Novel and Efficient Design for Squaring Units by Quantum-Dot Cellular AutomataVIT-AP University
 
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesOptimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesIntel® Software
 
Design and Implementation of Different types of Carry skip adder
Design and Implementation of Different types of Carry skip adderDesign and Implementation of Different types of Carry skip adder
Design and Implementation of Different types of Carry skip adderIRJET Journal
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceKyong-Ha Lee
 
ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...
ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...
ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...cscpconf
 

Similar a CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration (20)

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tr...
Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tr...Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tr...
Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tr...
 
SEMINAR[2].pptx automatic circuit design
SEMINAR[2].pptx automatic circuit designSEMINAR[2].pptx automatic circuit design
SEMINAR[2].pptx automatic circuit design
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
 
Implementation of Non-restoring Reversible Divider Using a Quantum-Dot Cellul...
Implementation of Non-restoring Reversible Divider Using a Quantum-Dot Cellul...Implementation of Non-restoring Reversible Divider Using a Quantum-Dot Cellul...
Implementation of Non-restoring Reversible Divider Using a Quantum-Dot Cellul...
 
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
 
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
A Novel Approaches For Chromatic Squander Less Visceral Coding Techniques Usi...
 
An Improved Optimization Techniques for Parallel Prefix Adder using FPGA
An Improved Optimization Techniques for Parallel Prefix Adder using FPGAAn Improved Optimization Techniques for Parallel Prefix Adder using FPGA
An Improved Optimization Techniques for Parallel Prefix Adder using FPGA
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
 
Modification on Energy Efficient Design of DVB-T2 Constellation De-mapper
Modification on Energy Efficient Design of DVB-T2 Constellation De-mapperModification on Energy Efficient Design of DVB-T2 Constellation De-mapper
Modification on Energy Efficient Design of DVB-T2 Constellation De-mapper
 
Modification on Energy Efficient Design of DVB-T2 Constellation De-mapper
Modification on Energy Efficient Design of DVB-T2 Constellation De-mapperModification on Energy Efficient Design of DVB-T2 Constellation De-mapper
Modification on Energy Efficient Design of DVB-T2 Constellation De-mapper
 
Kernel Descriptors for Visual Recognition
Kernel Descriptors for Visual RecognitionKernel Descriptors for Visual Recognition
Kernel Descriptors for Visual Recognition
 
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic Synthesis
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic SynthesisMinimum Cost Fault Tolerant Adder Circuits in Reversible Logic Synthesis
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic Synthesis
 
Implementation and Comparison of Efficient 16-Bit SQRT CSLA Using Parity Pres...
Implementation and Comparison of Efficient 16-Bit SQRT CSLA Using Parity Pres...Implementation and Comparison of Efficient 16-Bit SQRT CSLA Using Parity Pres...
Implementation and Comparison of Efficient 16-Bit SQRT CSLA Using Parity Pres...
 
A Novel and Efficient Design for Squaring Units by Quantum-Dot Cellular Automata
A Novel and Efficient Design for Squaring Units by Quantum-Dot Cellular AutomataA Novel and Efficient Design for Squaring Units by Quantum-Dot Cellular Automata
A Novel and Efficient Design for Squaring Units by Quantum-Dot Cellular Automata
 
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesOptimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
 
Design and Implementation of Different types of Carry skip adder
Design and Implementation of Different types of Carry skip adderDesign and Implementation of Different types of Carry skip adder
Design and Implementation of Different types of Carry skip adder
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduce
 
ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...
ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...
ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...
 

Último

Vip Mumbai Call Girls Mira Road Call On 9920725232 With Body to body massage ...
Vip Mumbai Call Girls Mira Road Call On 9920725232 With Body to body massage ...Vip Mumbai Call Girls Mira Road Call On 9920725232 With Body to body massage ...
Vip Mumbai Call Girls Mira Road Call On 9920725232 With Body to body massage ...amitlee9823
 
Why Does My Porsche Cayenne's Exhaust Sound So Loud
Why Does My Porsche Cayenne's Exhaust Sound So LoudWhy Does My Porsche Cayenne's Exhaust Sound So Loud
Why Does My Porsche Cayenne's Exhaust Sound So LoudRoyalty Auto Service
 
CELLULAR RESPIRATION. Helpful slides for
CELLULAR RESPIRATION. Helpful slides forCELLULAR RESPIRATION. Helpful slides for
CELLULAR RESPIRATION. Helpful slides foreuphemism22
 
Dubai Call Girls R0yalty O525547819 Call Girls Dubai
Dubai Call Girls R0yalty O525547819 Call Girls DubaiDubai Call Girls R0yalty O525547819 Call Girls Dubai
Dubai Call Girls R0yalty O525547819 Call Girls Dubaikojalkojal131
 
Rekha Agarkar Escorts Service Kollam ❣️ 7014168258 ❣️ High Cost Unlimited Har...
Rekha Agarkar Escorts Service Kollam ❣️ 7014168258 ❣️ High Cost Unlimited Har...Rekha Agarkar Escorts Service Kollam ❣️ 7014168258 ❣️ High Cost Unlimited Har...
Rekha Agarkar Escorts Service Kollam ❣️ 7014168258 ❣️ High Cost Unlimited Har...nirzagarg
 
Just Call Vip call girls Kovalam Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Kovalam Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Kovalam Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Kovalam Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Call Girls Bangalore Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Bangalore Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Bangalore Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Bangalore Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Call Girls Kadugodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Kadugodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Kadugodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Kadugodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...amitlee9823
 
Vip Mumbai Call Girls Colaba Call On 9920725232 With Body to body massage wit...
Vip Mumbai Call Girls Colaba Call On 9920725232 With Body to body massage wit...Vip Mumbai Call Girls Colaba Call On 9920725232 With Body to body massage wit...
Vip Mumbai Call Girls Colaba Call On 9920725232 With Body to body massage wit...amitlee9823
 
VVIP Mumbai Call Girls Mumbai Central Call On 9920725232 With Elite Staff And...
VVIP Mumbai Call Girls Mumbai Central Call On 9920725232 With Elite Staff And...VVIP Mumbai Call Girls Mumbai Central Call On 9920725232 With Elite Staff And...
VVIP Mumbai Call Girls Mumbai Central Call On 9920725232 With Elite Staff And...amitlee9823
 
Sanjay Nagar Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalor...
Sanjay Nagar Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalor...Sanjay Nagar Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalor...
Sanjay Nagar Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalor...amitlee9823
 
Call Girls Hongasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Hongasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Hongasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Hongasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
➥🔝 7737669865 🔝▻ Bhiwandi Call-girls in Women Seeking Men 🔝Bhiwandi🔝 Escor...
➥🔝 7737669865 🔝▻ Bhiwandi Call-girls in Women Seeking Men  🔝Bhiwandi🔝   Escor...➥🔝 7737669865 🔝▻ Bhiwandi Call-girls in Women Seeking Men  🔝Bhiwandi🔝   Escor...
➥🔝 7737669865 🔝▻ Bhiwandi Call-girls in Women Seeking Men 🔝Bhiwandi🔝 Escor...amitlee9823
 
Somya Surve Escorts Service Bilaspur ❣️ 7014168258 ❣️ High Cost Unlimited Har...
Somya Surve Escorts Service Bilaspur ❣️ 7014168258 ❣️ High Cost Unlimited Har...Somya Surve Escorts Service Bilaspur ❣️ 7014168258 ❣️ High Cost Unlimited Har...
Somya Surve Escorts Service Bilaspur ❣️ 7014168258 ❣️ High Cost Unlimited Har...nirzagarg
 
8377087607, Door Step Call Girls In Majnu Ka Tilla (Delhi) 24/7 Available
8377087607, Door Step Call Girls In Majnu Ka Tilla (Delhi) 24/7 Available8377087607, Door Step Call Girls In Majnu Ka Tilla (Delhi) 24/7 Available
8377087607, Door Step Call Girls In Majnu Ka Tilla (Delhi) 24/7 Availabledollysharma2066
 
Is Your BMW PDC Malfunctioning Discover How to Easily Reset It
Is Your BMW PDC Malfunctioning Discover How to Easily Reset ItIs Your BMW PDC Malfunctioning Discover How to Easily Reset It
Is Your BMW PDC Malfunctioning Discover How to Easily Reset ItEuroService Automotive
 
Is Your Volvo XC90 Displaying Anti-Skid Service Required Alert Here's Why
Is Your Volvo XC90 Displaying Anti-Skid Service Required Alert Here's WhyIs Your Volvo XC90 Displaying Anti-Skid Service Required Alert Here's Why
Is Your Volvo XC90 Displaying Anti-Skid Service Required Alert Here's WhyBavarium Autoworks
 

Último (20)

Vip Mumbai Call Girls Mira Road Call On 9920725232 With Body to body massage ...
Vip Mumbai Call Girls Mira Road Call On 9920725232 With Body to body massage ...Vip Mumbai Call Girls Mira Road Call On 9920725232 With Body to body massage ...
Vip Mumbai Call Girls Mira Road Call On 9920725232 With Body to body massage ...
 
Why Does My Porsche Cayenne's Exhaust Sound So Loud
Why Does My Porsche Cayenne's Exhaust Sound So LoudWhy Does My Porsche Cayenne's Exhaust Sound So Loud
Why Does My Porsche Cayenne's Exhaust Sound So Loud
 
CELLULAR RESPIRATION. Helpful slides for
CELLULAR RESPIRATION. Helpful slides forCELLULAR RESPIRATION. Helpful slides for
CELLULAR RESPIRATION. Helpful slides for
 
Dubai Call Girls R0yalty O525547819 Call Girls Dubai
Dubai Call Girls R0yalty O525547819 Call Girls DubaiDubai Call Girls R0yalty O525547819 Call Girls Dubai
Dubai Call Girls R0yalty O525547819 Call Girls Dubai
 
Rekha Agarkar Escorts Service Kollam ❣️ 7014168258 ❣️ High Cost Unlimited Har...
Rekha Agarkar Escorts Service Kollam ❣️ 7014168258 ❣️ High Cost Unlimited Har...Rekha Agarkar Escorts Service Kollam ❣️ 7014168258 ❣️ High Cost Unlimited Har...
Rekha Agarkar Escorts Service Kollam ❣️ 7014168258 ❣️ High Cost Unlimited Har...
 
Just Call Vip call girls Kovalam Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Kovalam Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Kovalam Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Kovalam Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls Bangalore Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Bangalore Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Bangalore Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Bangalore Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls Kadugodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Kadugodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Kadugodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Kadugodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
Call Girls in Patel Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Patel Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Patel Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Patel Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
(INDIRA) Call Girl Nashik Call Now 8617697112 Nashik Escorts 24x7
(INDIRA) Call Girl Nashik Call Now 8617697112 Nashik Escorts 24x7(INDIRA) Call Girl Nashik Call Now 8617697112 Nashik Escorts 24x7
(INDIRA) Call Girl Nashik Call Now 8617697112 Nashik Escorts 24x7
 
Vip Mumbai Call Girls Colaba Call On 9920725232 With Body to body massage wit...
Vip Mumbai Call Girls Colaba Call On 9920725232 With Body to body massage wit...Vip Mumbai Call Girls Colaba Call On 9920725232 With Body to body massage wit...
Vip Mumbai Call Girls Colaba Call On 9920725232 With Body to body massage wit...
 
VVIP Mumbai Call Girls Mumbai Central Call On 9920725232 With Elite Staff And...
VVIP Mumbai Call Girls Mumbai Central Call On 9920725232 With Elite Staff And...VVIP Mumbai Call Girls Mumbai Central Call On 9920725232 With Elite Staff And...
VVIP Mumbai Call Girls Mumbai Central Call On 9920725232 With Elite Staff And...
 
Sanjay Nagar Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalor...
Sanjay Nagar Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalor...Sanjay Nagar Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalor...
Sanjay Nagar Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalor...
 
Call Girls Hongasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Hongasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Hongasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Hongasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Bhiwandi Call-girls in Women Seeking Men 🔝Bhiwandi🔝 Escor...
➥🔝 7737669865 🔝▻ Bhiwandi Call-girls in Women Seeking Men  🔝Bhiwandi🔝   Escor...➥🔝 7737669865 🔝▻ Bhiwandi Call-girls in Women Seeking Men  🔝Bhiwandi🔝   Escor...
➥🔝 7737669865 🔝▻ Bhiwandi Call-girls in Women Seeking Men 🔝Bhiwandi🔝 Escor...
 
Somya Surve Escorts Service Bilaspur ❣️ 7014168258 ❣️ High Cost Unlimited Har...
Somya Surve Escorts Service Bilaspur ❣️ 7014168258 ❣️ High Cost Unlimited Har...Somya Surve Escorts Service Bilaspur ❣️ 7014168258 ❣️ High Cost Unlimited Har...
Somya Surve Escorts Service Bilaspur ❣️ 7014168258 ❣️ High Cost Unlimited Har...
 
8377087607, Door Step Call Girls In Majnu Ka Tilla (Delhi) 24/7 Available
8377087607, Door Step Call Girls In Majnu Ka Tilla (Delhi) 24/7 Available8377087607, Door Step Call Girls In Majnu Ka Tilla (Delhi) 24/7 Available
8377087607, Door Step Call Girls In Majnu Ka Tilla (Delhi) 24/7 Available
 
(INDIRA) Call Girl Surat Call Now 8250077686 Surat Escorts 24x7
(INDIRA) Call Girl Surat Call Now 8250077686 Surat Escorts 24x7(INDIRA) Call Girl Surat Call Now 8250077686 Surat Escorts 24x7
(INDIRA) Call Girl Surat Call Now 8250077686 Surat Escorts 24x7
 
Is Your BMW PDC Malfunctioning Discover How to Easily Reset It
Is Your BMW PDC Malfunctioning Discover How to Easily Reset ItIs Your BMW PDC Malfunctioning Discover How to Easily Reset It
Is Your BMW PDC Malfunctioning Discover How to Easily Reset It
 
Is Your Volvo XC90 Displaying Anti-Skid Service Required Alert Here's Why
Is Your Volvo XC90 Displaying Anti-Skid Service Required Alert Here's WhyIs Your Volvo XC90 Displaying Anti-Skid Service Required Alert Here's Why
Is Your Volvo XC90 Displaying Anti-Skid Service Required Alert Here's Why
 

CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

  • 1. CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration 1Xiaoyu Shi, 1Dahua Zeng, 2Yu Hu, 1Guohui Lin, 1Osmar R. Zaiane 1Dept. of Computing Science, University of Alberta 2Dept. of Electrical and Computer Engineering, University of Alberta Presented by Xiaoyu Shi LOGO Please address comments to bryanhu@ece.ualberta.ca
  • 2. Outline Introduction Circuit Similarity-Based Placement Experimental Results Conclusion and Future Work
  • 3. Introduction  Field Programmable Gate Array (FPGA)  Ease of design, low start-up costs and fast manufacturing turnaround time.  Size of FPGAs has reached million gates level.  Modern FPGA designs suffer from long compilation time. Xilinx SPARTAN-6 board  FPGA placement  Determines which logic block within an FPGA should implement each of the logic blocks required by the circuits.  Has a significant impact on the performance and routability in nanometer circuit designs.  The optimization goals are to minimize certain criteria, such as wire length, critical delay and area.  Now becomes the bottleneck of modern FPGA circuit design [Chen’06].  Up-to-date fast placement algorithms  Extensive studies have been performed to improve the placement efficiency as a single synthesis phase for decades.  State-of-the-art work includes using multi-core [Ludwin’08], embedding- based [Gopalakrishnan’06], partitioning-based [Maidee’05], multi-level [Sankar’99], simulated annealing [Betz’97].
  • 4. Reusable Info in CAD  Incremental design for FPGAs  Design preservation is the key of incremental design.  Similarity among circuits exists because functional changes or optimizations are small, and they generally result in a similar topology of the modified circuit compared to the original circuit [Krishnaswamy’09]. Final design Final iteration Optimizations, timing, Iteration 3 … etc … Changes due to Iteration 2 verification, timing, etc Initial design Iteration 1 Incremental design process for FPGAs
  • 5. Reusable Info in CAD (Cont.)  Design space exploration for FPGAs  FPGA design offers a variety of customizations by varying design parameters.  Local similarity and global similarity exist in design space exploration. Final design Optimizations, timing, etc … Changes due to verification, timing, etc Initial design Constant multiplier blocks by CMU SPIRAL [Puschel’04]
  • 6. Data Mining  Overview  The key of data mining is to extract patterns and useful information from data, including text, graphs and circuits, etc.  It has been extensively studied since 1950s, and has been widely applied to many domains, such as businesses, sciences and health cares.  Graph mining, including graph pattern mining, graph classification and graph compression, is a research hot area in data mining [Borgwardt’08].  Graph similarity  It quantitatively defines the topological similarity between two graphs.  It has been used to many applications, such as web searching [Kleinberg’99], social network mapping [Watts’99] and chemical structure matching [Hattori’03].
  • 7. Graph Similarity  Summary of graph similarity measures Measure Description Time Global Complexity Topo Isomorphism Identifying a bijection between the nodes NP-Hard Yes [Pelillo’02] of two graphs which preserves (directed) adjacency Edit distance Given a cost function on edit operations, NP-Hard Yes [Bunke’99] determine the minimum cost transformation from one graph to another Common subgraph Identifying the largest isomorphic NP-Hard Yes [Fernandez’01] subgraphs of two graphs Iterative methods Two graph elements are similar if their Cubic Yes [Blondel’04] neighborhoods are similar Statistical methods Assessing aggregate measures of graph Linear No [Alberta’02] structure, degree distribution, diameter, betweenness measures  Iterative methods  It has lower computational complexity and considers global topological information.  It takes advantage of the graph sparsity.
  • 8. Circuit Similarity  Circuit similarity  We define circuit similarity to describe the similar topological structures between two circuits.  We adapt the iterative methods in graph similarity.  It exists in several CAD phases, such as placement, routing and verification.  It can be widely used to accelerate FPGA designs, such as incremental design and exploration of the design space, etc.
  • 9. Outline Introduction Circuit Similarity-Based Placement Experimental Results Conclusion and Future Work
  • 10. Motivating Example  Circuit similarity algorithm V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V’7 0.92 0.25 0.48 0.15 0 0 0 0.42 0.06 0 V’8 0 0.73 0 0 0.05 0 0.39 0 0.17 0.06 V’9 0 0.39 0 0 0.4 0 0.73 0 0.06 0.48 V’10 Graph G 0.48 0 0.89 0.25 0.3 0.12 0.14 0.06 0.33 0.09 V’11 0 0 0.11 0.48 0 0.86 0 0.36 0.17 0 V’12 0 0 0.3 0.34 0.64 0.25 0.39 0.34 0.15 0.42 V’13 0.48 0.25 0.07 0.4 0 0.36 0 0.88 0.06 0 V’14 0.4 0.39 0.29 0.15 0.15 0.18 0.12 0.46 0.59 0.06 V’15 0 0.12 0.09 0 0.63 0 0.36 0 0.27 0.82 Similarity score matrix for G and G’ Graph G’
  • 11. Motivating Example (Cont.)  Circuit similarity-based placement  The initial placement of the new circuit design (G’) is generated by computing the similarity between the original (G) and modified circuits, and finding the correspondent node matching.  A low-temperature simulated annealing is applied to further refine the results.  The proposed circuit similarity algorithm can be used to speedup placement, which allows faster incremental design and design space exploration.
  • 12. Motivating Example (Cont.) (a) Placement of (b) Init placement (c) Final placement (d) init placement (c) Final placement reference config using CS using CS using VPR using VPR Placement layouts comparison of circuit “des”  A real example Wire Delay Critical Runtime (E-05) Delay (s)  For circuit “des”, the reference (E-08) configuration (synthesized using “resyn3” script in ABC) has 1245 CS-init 306 5.93 - - CLBs and 1501 nets while the new configuration (synthesized VPR-init 1087 14.00 - - using “rwsat2” script in ABC) has 1215 CLBs and 1471 nets. CS-final 237 5.08 8.28 13.38  The results show that CSBP successfully finds the internal VPR-final 221 4.98 10.10 28.42 node correspondence. Status of placement results of circuit “des”
  • 13. Circuit Similarity CAD Flow CAD flow for incremental design CAD flow for design space exploration
  • 14. Circuit Similarity Algorithm  Iterative similarity algorithm  We employ the iterative similarity algorithm for undirected molecular graphs [Rupp’07].  We adapt the iterative similarity algorithm to consider directed circuit graphs, fix the I/O pins, and compute the similarity of fanin and fanout nodes respectively, based on unique circuit constraints. If (|in(vi)| < |in(v’j)| and |out(vi)| < |out(v’j)|) Summary of variables
  • 15. Performance Enhancement  Support constraint  A support of a node is the set of nodes with predefined matchings  Formally, if v ∈ G and v’ ∈ G’, the in the transitive fanin or fanout cone of this node. support constraint requires: where β ∈ (0,1].  Level constraint  A topological sort and reverse  Formally, if v ∈ G and v’ ∈ G’, the topological sort can label each internal node with two values. level constraint requires: where Bl and Br are two nonnegative integers. Effectiveness of the pruning techniques
  • 16. Outline Introduction Circuit Similarity-Based Placement Experimental Results Conclusion and Future Work
  • 17. Incremental Design  f  CAD flow  Two-iteration CAD flow.  CSBP flow (a) and from-scratch flow (b) are compared.  Optimization “imfs” reduces the number of CLBs by 2%.  Settings  Two versions of CSBP are compared: A high quality version (CS) with β = 0.5, inner_num = 1 and Bl = Br = 1; A turbo version (CS-t) with β = 1, inner_num = 0.1 and Bl = Br = 0.  CSBP is implemented in C and evaluated on the 20 largest MCNC benchmarks.  The results are averaged over 5 funs on a Linux server with dual- core 2.19GHz CPU and 5GB memory.  CS2 package [Goldberg’97] is used for maximum matching problem. CAD flow for incremental design
  • 18. Results  Initial placement results  Bounding box cost (bb cost) and delay cost are compared.  Clearly, the initial placement results generated using CS is much better than VPR’s initial results, and is very close to VPR’s final results. 100% 100% 90% 90% 80% 80% Percentage Percentage 70% 70% 60% 60% 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% 0% s38417 s38584 s38417 s38584 s298 s298 pdc alu4 ex1010 pdc alu4 apex2 apex4 ex1010 tseng apex2 apex4 tseng ex5p frisc ex5p seq des frisc des seq diffeq misex3 spla bigkey clma diffeq dsip misex3 spla bigkey clma dsip elliptic elliptic CS-init VPR-final VPR-init CS-init VPR-final VPR-init Comparisons of initial bb cost Comparisons of initial delay cost CS reduces bb cost by 72% on avg. compared to VPR CS reduces delay cost by 53% on avg. compared to VPR
  • 19. Results (Cont.) 300000  Post-routing results comparison 250000  A low-temperature annealing is 200000 applied to the initial results. 150000  Wire length, critical delay and area are compared. 100000  The results demonstrate the 50000 effectiveness of the pruning 0 techniques, which do not affect the apex2 apex4 ex1010 tseng ex5p s38417 s38584 seq bigkey des clma diffeq dsip misex3 s298 spla alu4 pdc frisc elliptic quality significantly. CS-t CS VPR Wire length CS increases the wire length by 3% on avg. 4.00E+08 4.50E-07 3.50E+08 4.00E-07 3.00E+08 3.50E-07 2.50E+08 3.00E-07 2.00E+08 2.50E-07 1.50E+08 2.00E-07 1.00E+08 1.50E-07 1.00E-07 5.00E+07 5.00E-08 0.00E+00 0.00E+00 s38417 s38584 s298 pdc alu4 apex2 apex4 ex1010 tseng des ex5p frisc seq bigkey clma diffeq dsip misex3 spla elliptic s38417 s38584 s298 pdc alu4 apex2 apex4 ex1010 tseng des ex5p frisc seq bigkey clma diffeq dsip misex3 spla elliptic CS-t CS VPR Area CS-t CS VPR Critical delay CS increases the area by 2% on avg. CS increases the crit. delay by 6% on avg.
  • 20. Results (Cont.)  Runtime comparison  Only placement time is compared.  CS-t achieves 31x speedup on average, with up to 91x.  More speedup is expected when circuits become larger. 100 90 80 70 Speedups 60 50 40 30 20 10 0 CS-t CS VPR Speedups compared to VPR
  • 21. Design Space Exploration  CAD flow  Study logic-level and algorithm- level design space, respectively.  CSBP flow (a) and from-scratch flow (b) are compared.  Settings  The logic-level design space consists of 19 configurations generated by 19 ABC1 synthesis scripts in abc.rc.  The algorithm-level design space consists of 18 configurations of constant multiplier generated by CMU SPIRAL [Puschel’04] varying bits from 7 to 252.  Both CS and CS-t are evaluated.  The benchmarking environments are the same as logic-level design space exploration. 1 http://www.eecs.berkeley.edu/~alanmi/abc/ 2 CAD flow for design space exploration Bit = 16 is abandoned due to ABC crash
  • 22. Logic-level Sample Synthesis Scripts Alias Scripts resyn "b; rw; rwz; b; rwz; b" resyn2 "b; rw; rf; b; rw; rwz; b; rfz; rwz; b" resyn2a "b; rw; b; rw; rwz; b; rwz; b" src_rw "st; rw -l; rwz -l; rwz -l" src_rs "st; rs -K 6 -N 2 -l; rs -K 9 -N 2 -l; rs -K 12 -N 2 -l" choice "fraig_store; resyn; fraig_store; resyn2; fraig_store; fraig_restore" rwsat "st; rw -l; b -l; rw -l; rf -l" compress "b -l; rw -l; rwz -l; b -l; rwz -l; b -l" share "st; multi -m; fx; resyn2" http://www.eecs.berkeley.edu/~alanmi/abc/
  • 23. Logic Level Results 2500  Initial results comparison 2000  The number of CLBs and levels vary 1500 widely in logic-level design space. 1000  Show circuit “dsip” as an example. 500  Bounding box cost and delay cost are 0 compared for initial placement shake rwsat2 share resyn2rsdc resyn2a choice compress2rsdc resyn2 resyn3 choice2 rwsat src_rs compress2 src_rw src_rws resyn2rs resyn compress compress2rs results. CS CS-t VPR Initial bb cost of “dsip” CS reduces bb cost by 76% on avg. 4.00E-04 Critical delay 3.00E-04 2.00E-04 1.00E-04 0.00E+00 compress2rs… resyn2a resyn2 resyn3 compress2 shake src_rws resyn2rs resyn compress rwsat2 share compress2rs resyn2rsdc choice choice2 rwsat src_rs src_rw CS CS-t VPR Initial delay cost of “dsip” CS reduces delay cost by 48% on avg. Characteristics of logic-level design space
  • 24. Logic Level Results (Cont.)  Final placement results  Wire length and critical delay of circuit “dsip” are compared.  The final results produced by CS and CS-t are very close or better compared to VPR’s, with 32% overhead for wire length and 20% improvement for critical delay. 100% 100% 80% 80% Percentage Percentage 60% 60% 40% 40% 20% 20% 0% 0% resyn2a resyn2 resyn3 compress2 shake src_rws resyn2rs resyn compress rwsat2 share compress2rs resyn2rsdc choice compress2rsdc choice2 rwsat src_rs src_rw resyn2a resyn2 resyn3 compress2 shake src_rws resyn2rs resyn compress rwsat2 share compress2rs resyn2rsdc choice compress2rsdc choice2 rwsat src_rs src_rw CS-t CS VPR CS-t CS VPR Final wire length comparison of “dsip” Final critical delay comparison of “dsip”
  • 25. Logic Level Results (Cont.) 800 700  Design space shape characterization 600  We compare the minimal, median and 500 maximal wire length and critical delay 400 produced by CS, CS-t and VPR. 300 200  We also compare the shapes of each configuration over 19 designs. 100 0  The almost identical curves show that compress2… shake rwsat2 share resyn2rsdc resyn2a choice resyn2 resyn3 choice2 rwsat src_rs compress2 src_rw src_rws resyn2rs resyn compress compress2rs CSBP is able to accurately depict the shape of a design space. vpr cs cs-t Shape of final wire length of circuit “dsip” 2500 4.5E-07 0.0000004 2000 3.5E-07 0.0000003 1500 2.5E-07 0.0000002 1000 1.5E-07 500 0.0000001 5E-08 0 0 ex1010 apex2 apex4 tseng des ex5p s38417 s38584 bigkey clma diffeq dsip misex3 s298 seq spla pdc alu4 frisc elliptic s38417 s38584 s298 alu4 apex2 apex4 ex1010 pdc tseng bigkey des ex5p frisc seq spla clma diffeq dsip misex3 elliptic vpr-min cs-min cs-t-min vpr-min cs-min cs-t-min Shape of minimal wire length of 20 circuits over 19 designs Shape of minimal crit. delay of 20 circuits over 19 designs
  • 26. Logic Level Results (Cont.)  Runtime comparison  Only placement time is compared.  CS-t achieves 30x speedup on average, with up to 100x.  In practice, one can take advantage of the significant speedup of CS-t to perform quick design space exploration. 100 90 80 70 Speedups 60 50 40 30 20 10 0 s38417 s38584 s298 pdc alu4 apex2 apex4 tseng ex1010 frisc des ex5p seq spla bigkey clma diffeq misex3 dsip elliptic CS CS-t VPR Runtime comparison Speedups compared to VPR (“*” marked time is measured with a timeout )
  • 27. Algorithm Level Results  Experimental settings  The algorithm-level design is a constant multiplier.  The design parameter explored in our experiments is the fractional bits varying from 7 to 251.  CMU SPIRAL is used to generate RTL design based on Hcub algorithm [Voronenko’07]. Characteristics of algorithm-level design space generated by CMU SPIRAL  Experimental results  The initial and final placement results are similar to logic-level space exploration.  CS and CS-t achieve 7x and 30x speedup compared VPR, respectively. An example of a constant parallel multiplier 1 Bit = 16 is abandoned due to ABC crash
  • 28. Algorithm Level Results (Cont.)  Wire length-delay space comparison  The pareto-points, which are the optimal configurations in a design space, are of most interests to IC designers.  CS and VPR find the same pareto-points.  Bits = 24 is used as the reference circuit. 4.00E-07 4.25E-07 Estimated critical delay Estimated critical delay 3.50E-07 B19 B25 3.75E-07 B25 B19 B18 B18 3.00E-07 B23 3.25E-07 B23 B22 B17 B22 B21 B17 2.50E-07 B14 B21 2.75E-07 B14 B12 B15 B15 B12 2.00E-07 2.25E-07 B8 B7 B10 B10 1.50E-07 B9 1.75E-07 B8 B9 B7 0 100 200 300 400 500 0 200 400 600 Wire length Wire length Wire length-delay space of VPR Wire length-delay space of CS
  • 29. Outline Introduction Circuit Similarity-Based Placement Experimental Results Conclusion and Future Work
  • 30. Future Work  Improvement to CSBP  Integrate predefined matchings, for example, naming matching, into our CSBP to further enhance both the efficiency and the quality of the design.  Other applications  Study the effectiveness of applying circuit similarity algorithm to other applications, such as routing and sequential verification for FPGAs
  • 31. Conclusion  Proposed an efficient circuit similarity algorithm  Developed CSBP, a fast circuit similarity-based placement for FPGAs  Applied CSPB to incremental design and design space exploration.  Open-source tool available at: http://webdocs.cs.ualberta.ca/~xshi/soft.html  Applied CSBP to incremental design for FPGAs  CSBP is able to reduce engineering effort by capturing the similarity from the previous design iterations.  CSBP is 31x faster compared to VPR.  Applied CSBP to design space exploration for FPGAs  CSBP can precisely depict the shape of a design space and pinpoint the optimal designs.  CSBP is 30x faster compared to VPR.
  • 32. Xiaoyu Shi, Dahua Zeng, Yu Hu, Guohui Lin, Osmar R. Zaiane CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration LOGO www.themegallery.com