SlideShare una empresa de Scribd logo
1 de 4
Descargar para leer sin conexión
Ultra Low Energy vs Throughput Design
 Exploration of 65 nm Sub-VT CMOS Digital Filters
          S. M. Yasser Sherazi, Joachim N. Rodrigues, Omer C. Akgun, Henrik Sjöland, and Peter Nilsson
                             Department of Electrical and Information Technology, Lund University
                                               Box 118, SE-221 00 Lund, Sweden
                Email: {yasser.sherazi, joachim.rodrigues, omercan.akgun, henrik.sjoland, peter.nilsson}@eit.lth.se


    Abstract—This paper presents an analysis on energy dissipa-
 tion of a digital half band filters operated in the the sub-threshold
 (sub-VT ) region with throughput constraints. The degradation
 of speed in the sub-VT domain is counteracted by unfolding
 the architectures. A filter is implemented in a basic 12-bit and
 its various unfolded structures. The designs are synthesized in
 a 65 nm low-leakage high-threshold CMOS technology. A sub-                                   Fig. 1.   Receiver system.
 VT energy model is applied to characterize the designs in the
 sub-VT domain. The results from application of an energy               250 Ksamples/s. Therefore, a chain of decimation filters needs
 model shows that the unfolded by 2 architecture is most energy         to be applied. To achieve lower energy dissipation, we are
 efficient, dissipating 22 % less energy compared to it the original     employing voltage scaling techniques rigorously, hence mak-
 filter implementation at energy minimum voltage. Unfolded by            ing the designed circuits run in the sub-threshold (sub-VT )
 4 architecture, however, is the best for throughput requirements
 of around 120 Ksamples/sec to 1 Msamples/s, as it dissipates less      domain [1]. When operating in the sub-VT domain, leakage
 energy than any other implementation in this speed range.              currents are to be dealt with, which are the source of energy
                                                                        dissipation in idle CMOS [2]. This current puts an important
                        I. I NTRODUCTION                                design constraint especially in implantable medical devices.
    Miniaturized devices are important in medicine, sensor              Consequently, we need to optimize the circuits in terms of
 networks, and many other applications. Engineers aim to                energy dissipation and throughput for sub-VT operation.
 develop ultra compact and low energy circuits that may be                 In Sec. II we briefly present the applied sub-VT energy
 used in devices like hearing aids, medical implants, and remote        model. In Sec. III we present a 12-bit architecture of a
 sensors. There is currently a major interest in small wireless         Half Band Digital (HBD) filter that is implemented as direct
 devices with ultra low energy dissipation targeting on-body            mapped and its various unfolded structures. In Sec. IV the
 applications or medical implants. In such devices minimal              results attained from the HBD filters are shown and discussed,
 energy dissipation in active and standby mode, is of highest           and finally, the conclusions are presented in Sec. V.
 importance as it makes the battery last longer, which is                               II. S UB -VT E NERGY M ODEL
 important as it is non-trivial to change or charge a battery in a
 medical implant. Devices like hearing aids that communicate               The current of a MOS transistor is not equal drop to zero
 between the two ears to improve binaural hearing may benefit            when the gate to source voltage VGS is equal to or below
 from energy efficient wireless receivers. Another example is a          the threshold voltage VT , VGS ≤ VT , which is an indication
 neural sensor inside the body that communicates with a robotic         for leakage currents, commonly referred to as the sub-VT or
 arm or leg. If a radio is made sufficiently small and with              weak inversion conduction [3]. The existent current is due
 minimal power consumption, there will be vast possibilities            to leakage and low in amperage, and in the sub-VT domain
 for new applications.                                                  used as the operating switching current. The drawback of sub-
    In the conducted project the design constraints are, less           VT circuits is speed penalty. However, circuits that operate at
 than 1 mW and 1 µW power consumption in active and                     sub-VT manage to satisfy the ultra low energy requirements,
 standby mode, respectively, capacity to handle data rates up           since order of magnitudes less energy is dissipated compared
 to 250 kbits/s, and realization on a single chip with an area of       to super-threshold circuits [3]. The total energy dissipation of
 1 mm2 in 65 nm CMOS. A block digram shows the receiver                 static CMOS digital circuits typically modelled as
 system in Fig. 1, containing a RF front-end (2.5 GHz), an                Etotal = αCtot VDD 2 + Ileak VDD Tclk + Ipeak tsc VDD , (1)
 analog-to-digital converter, a digital baseband for demodula-
                                                                                       Edyn                Eleak           Esc
 tion and control, and finally, an analog decoder that processes
 the received data packets.                                             where Edyn is the average switching energy and Eleak is
    The main focus of this paper is on the digital baseband             leakage energy dissipated during a clock cycle Tclk . As it
 part of the receiver system. The first task of the digital              is known that the energy dissipation due to short circuit
 baseband circuit is to re-sample data from 4 Msamples/s to             (Esc ) in the sub-VT domain is minor compared to the overall
978-1-4244-8971-8/10$26.00 c 2010 IEEE
(a)
               (a)                                       (b)
Fig. 2.   Half Band Digital Filter. (a) single HBD filter (b) uf-2 HBD filter.

energy dissipation, which therefore is neglected [1]. In (1),
Edyn during one clock period is proportional to the switching
activity factor (α), and the total switched capacitance of the
circuit (Ctot ).
   The model used to calculate energy dissipation delivers
SPICE-accurate results [4]. This model calculates total energy
dissipation by (2), and the key parameters required are ob-
tained during synthesis and high level simulations.

    ET = Cinv VDD µe kcap + kcrit kleak e−VDD /(nUt ) ,
               2
                                                                        (2)

where kleak is average leakage scaling factor of the circuit is
normalized to the average leakage current of a single inverter.
The scaling factor kcap is the normalized total capacitance
of the circuit in terms of a single inverter capacitance. The
kcrit is a coefficient that measures the critical path delay of                                                  (b)
the circuit in terms of a single inverter delays. The average                  Fig. 3. Unfolded Architectures of the HBD filter. (a) uf-4 HBD filter (b)
switching activity of circuit per N samples operations is µe .                 uf-8 HBD filter.
A process dependent constant called slope factor is n, and Ut
is the thermal voltage and its value is 26 mV at 300 K. For                    All the filter coefficients are 1 or 2 may be implemented
more details the reader is referred to [4].                                    by simple shifting, and thereby saving the area and energy
                                                                               dissipation of the circuit. An initial analysis indicates that
                     III. F ILTER A RCHITECTURES                               the required throughput would not be achieved by a single
  Minimum energy dissipation with medium to high through-                      sample implementation of this filter. Therefore, unfolding was
put requirement puts stringent constraints on a design. There-                 applied. Unfolding is a transformation technique that calculate
fore, it is important to explore and analyse the architectures                 j samples per clock cycle, where j is the unfolding factor.
that best fulfill the requirements. This section presents the                   Unfolding has a property of preserving the number of delays
HBD filter and the architectural differences in the basic and                   in a Direct Form Graph (DFG) [7]. The basic HBD filter
unfolded versions.                                                             architecture was unfolded to get three more structures, i.e.,
                                                                               unfolded by 2 (uf-2), unfolded by 4 (uf-4) and, unfolded
A. Half Band Digital Filter
                                                                               by 8 (uf-8). In all unfolded architectures the number of
   An optimized third order filter structure is evaluated for                   registers remain unchanged, whereas the adders scale with the
minimum energy dissipation. The filter structure for the par-                   unfolding factor. Fig. 2(b), shows the uf-2 version of the filter.
allel implementation, see Fig. 2(a), is a parallel third-order                 Furthermore, the critical path of this circuit is equal to the
bi-reciprocal lattice wave digital filter, [5], considered as                   original HBD filter structure. Fig. 3(a) shows an architecture
highly suitable as decimator or interpolator, for sample rate                  that was unfolded by a factor of 4. The number of adders has
conversions with a factor of two. The benefit of using this type                increased according to the unfolding factor. The critical path
of filter is that all filtering may be performed at lower sample                 has increased, since two of the feedback paths do not contain
rates, with low arithmetic complexity, therefore, yielding both                a register. Similarly, Fig. 3(b), shows the architecture of uf-8
low energy dissipation and a low chip area [6]. The transfer                   HBD, the adders have increased by a factor of 8, compared
function of the proposed filter is                                              to the original HBD structure. The critical path increases,
                            1 + 2z −1 + 2z −2 + z −3                           since six of the feed back paths do not contain any register.
                     Hz =                            ,                  (3)    However, there are more samples processed per clock cycle in
                                    2 + z −2
TABLE I
  E XTRACTED PARAMETER FOR THE S YNTHESIZED I MPLEMENTATIONS             Energy dissipation is calculated under the assumption that
     Arch.  kleak  kcap     kcrit   µe     Area    tp [nsec]          the designs operate at critical path speed, which gives an En-
    par      1113.6      835.4       127.4   0.727     1124   2.84    ergy Minimum Voltage (EMV) point [9]. The threshold voltage
    uf-2     1695.5      1375.7      127.4   0.708     1836   2.84    for this LL-HVT device is around 430 mV. The designs’ energy
    uf-4     3172.5      2797.9      164.2   0.703     3275   3.66    characteristics, over a scaled supply voltage VDD per clock
    uf-8     5924.5      5422.3      232.2   0.890     6170   5.22    cycle is presented in Fig. 4(a). It is shown that the basic
                                                                      HBD filter implementation denoted by (par) dissipates the
                           TABLE II                                   minimum amount of energy per clock cycle when compared
       C HARACTERIZATION OF THE I MPLEMENTATIONS AT EMV               with the other three implementations. The reason being that
      Arch.   EMV Freq.      Throughput   E/Cyc    E/smp              the leakage for this circuit is less than that of the other circuits
              [mV]   [kHz]   [ksamples/s] [fJ]     [fJ]               thanks to less area. The energy minima (per clock cycle) of
      par       241          23.6    23.6            45       45      45.5 fJ for par implementation is achieved around 241 mV
      uf-2      238          23.6    47.2            71       35      (indicated by the dot), which is lower than EMV of any other
      uf-4      247          22.0    88.0            150      38
                                                                      architecture, which confirms that lesser area contributes to less
      uf-8      251          15.4    123.4           380      48
                                                                      energy per clock cycle. However, it is crucial to investigate
                          TABLE III                                   the energy spent on the processing of each sample of data,
     P ERFORMANCES OF THE I MPLEMENTATIONS AT R EQUIRED               and the apparent benefit of using par structure is lost when
                        T HROUGHPUTS                                  the energy per operation or energy per sample is considered.
 Throughput    Circuits    Vdd V [mV]  E/Cyc [fJ]  E/smp [fJ]         Fig. 4(b), shows the energy dissipation per sample for different
 2 Msamples/s         uf-8          390          656           82.2   structures. Reason being that unfolded circuits perform twice,
 1 Msamples/s         uf-8          368          586           73.3   four and eight times as much operations per clock cycle,
                      uf-4          376          246           61.5   therefore the over all energy per sample for these circuits is
                      uf-2          400          136           68.3   reduced when compared to a single sample implementation.
 500 Ksamples/s       uf-8          344          525           65.2   Fig. 4(b), shows that the most efficient architecture is uf-2 as it
                      uf-4          352          226           54.7   dissipates 35.8 fJ per sample which is 45 % less than the energy
                      uf-2          368          116           58.4   dissipated by the par structure. Here, we may observe that
                      par           400          85.2          85.2
                                                                      the uf-8 architecture is less energy efficient than par, even in
 250 Ksamples/s       uf-8          300          434           55.0
                                                                      energy dissipation per sample at lower voltages and is almost
                      uf-4          320          188           47.0
                                                                      equal to par, near the threshold voltages. The reason for this
                      uf-2          344          126           51.8
                                                                      behaviour is that the uf-8 has higher switching activity µe . The
                      par           368          72.9          72.9
                                                                      maximum frequency attainable with respect to VDD is shown
the unfolded structures, which wins with respect to throughput        in Fig 4(c), the maximum frequency for both par and uf-2, is
over a limited increase in the critical path [8].                     always higher than their counterparts due to a shorter critical
                                                                      path, and the uf-8 has the slowest maximum speed because of
B. Hardware Mapping                                                   longer critical path, see Table I. Fig 4(d), shows the energy
                                                                      dissipation of all the structures with respect to throughput.
   All the cells used for implementation are from a low-leakage          Table II, presents the characteristics of all the presented
high-threshold (LL-HVT) standard cell library. Tight synthesis        architectures at EMV, showing the maximum frequencies
constraints were set to get minimum area and a short critical         attainable, the corresponding throughputs, energy dissipated
path. The parameters for the energy model were retrieved by           per clock cycle, as well as per sample. These simulations show
gate-level simulations with back annotated toggle and timing          that we benefit from unfolding technique, both in energy per
information, which includes glitches. The parameters obtained         sample and in throughput.
were applied to the energy model to characterize the designs             In the project discussed in Sec. I, we need a chain of four
in the sub-VT domain.                                                 HBD filters, that reduces the high frequency data with the
                                                                      rate of 4 Msamples/s from the ADC to the actual data rate of
                      IV. S IMULATION R ESULT
                                                                      250 Ksamples/s. The first HBD filter must process the input
  In this section the architectures of the filter are evaluated        data stream with the rate of 2 Msamples/s. This throughput
with respect to energy and throughput. The parameters re-             requirement is only fulfilled by using uf-8 HBD near 390 mV,
quired for the energy model [4], extracted during synthesis           as shown in Table III and Fig. 4(d). The throughput require-
and energy simulations, discussed in II, are presented in             ment of data with the rate of 1 Msamples/s for the second
Table I. The values for kleak follow the area cost, indicating        HBD is fulfilled by using any three of the unfolded structure,
proportional leakage with respect to area. The k parameters           uf-8, uf-4 and uf-2. The throughput requirement of data with
for the unfolded implementations are not proportional to the          the rate of 500 Ksamples/s for third HBD is fulfilled by all
unfolding factor j since the number of internal registers remain      four structures as shown in Table III and Fig. 4(d). The
unchanged from the basic implementation, although there is            throughput requirement of data with the rate of 250 Ksamples/s
an increase in the number of input and output registers.              for last HBD is again fulfilled by all structures. In Fig. 4(b),
3
                      10




                                                                                                                         2
                                                                                                                    10

                                                          uf-8




                                                                                               Energy/samp [fJ]
        Energy [fJ]




                                                          uf-4

                      10
                           2                                                                                                                                uf-8
                                                          uf-2                                                                                              par
                                                                                                                                                            uf-4
                                                          par
                                                                                                                                                            uf-2
                                 0.15     0.2          0.25             0.3   0.35   0.4                                          0.15     0.2          0.25              0.3   0.35        0.4
                                                   VDD [V]                                                                                             VDD [V]


                                                  (a)                                                                                                 (b)
                       3
                      10



                                                                                                                         90


                       2
                      10
                                                                                                                         80




                                                                                                      Energy/samp [fJ]
      fmax [kHz]




                                                 uf-2
                                                                 uf-8                                                    70
                       1
                      10                   par
                                             uf-4                                                                                                                par
                                                                                                                         60


                       0
                      10                                                                                                 50                                      uf-8

                                                                                                                         40
                                                                                                                                                     uf-4
                       −1
                                                                                                                                                  uf-2
                      10
                           0.1    0.15     0.2         0.25             0.3   0.35   0.4                                     1k          10k                       100k                1M
                                                   V      [V]                                                                                    Throughput [samples]
                                                    DD



                                                  (c)                                                                                                 (d)
Fig. 4. Simulation Plots of HBD filter architectures, (a) Energy vs VDD per clock cycle, (b) Energy vs VDD per sample. (c) Frequency vs VDD , (d)
Energy vs Throughput

the uf-2 structure appears to be the most energy efficient                                  unfolded implementation to achieve low energy dissipation per
circuit. However, when stringent throughput requirements are                               sample at EMV, when compared to the energy dissipated by
in-place the uf-4 structure proves to be the best option as                                a basic basic HBD filter implementation.
shown in Fig. 4(d) and Table III. This analysis shows that
                                                                                                                                         ACKNOWLEDGMENT
its crucial to identify the most suitable architectures for the
given throughput and energy requirements. Furthermore, in                                     The authors would like to thank Swedish Foundation for
[10] it is argued that low-leakage low-threshold cells are more                            Strategic Research (SSF) for funding the Wireless Communi-
beneficial at higher throughput rates in sub-VT domain, which                               cation for Ultra Portable Devices projects at Lund University.
needs to be further investigated for these filter implementation.                                                                                 R EFERENCES
   In [1] it was shown in that the supply voltage of sub-VT
                                                                                            [1] E. Vittoz, Low-Power Electronics Design. CRC Press, 2004, ch. 16.
circuits may be reduced down to 50 mV. However, in practical                                [2] P. van der Meer, Low-Power Deep Sub-Micron CMOS Logic. Kluwer
terms at such low voltage values functional failures frequently                                 Academic Publishers, 2006.
occur due to the process variations. It was found in [11] that                              [3] H. Soeleman and et al., “Robust subthreshold logic for ultra-low power
                                                                                                operation,” IEEE T-VLSI Systems, vol. 9, pp. 90–99, Feb 2001.
the supply voltage value which realizes operation with less                                 [4] O. C. Akgun and Y. Leblebici, “Energy efficiency comparison of
than 0.001 failure rate for a 65 nm LL-HVT process is 250 mV                                    asynchronous and synchronous circuits operating in the sub-threshold
and this value is taken as the minimum reliable operating                                       regime,” Journal of Low Power Electronics, vol. 4, OCT 2008.
                                                                                            [5] P. Nilsson and M. Torkelson, “Method to save silicon area by increasing
voltage (ROV), indicated in the Fig. 4(b) by a line at 250 mV.                                  the filter order,” in Electronic letters. ACM, NY, USA, 1995.
The simulations show that for the required throughput we are                                [6] H. Ohlsson and et al., “Arithmetic transformations for increased maximal
operating safely above ROV, see Table III.                                                      sample rate of bit-parallel bireciprocal lattice wave digital filters,” in
                                                                                                ISCAS, 2001.
                                         V. C ONCLUSION                                     [7] K. K. Parhi, VLSI Digital Signal Processing Systems, 1999, ch. 5.
                                                                                            [8] P. Åstrom, P. Nilsson, and et al., “Power reduction in custom CMOS
   In this paper four HBD filter structures are evaluated for                                    digital filter structures,” AICSP Journal, vol. 18, pp. 97–105, 1998.
minimum energy dissipation in the sub-VT domain for a                                       [9] J. Rodrigues and et al., “A <1 pJ Sub-VT cardiac event detector in 65
                                                                                                nm LL-HVT CMOS,” VLSI-SOC, 2010.
throughput constrained system. All architectures i.e., the un-                             [10] D. Markovic, J.M.Rabaey, and et al., “Ultralow-power design in near-
folded by 2,4,8 and the basic HBD filter, are implemented and                                    threshold region,” Proceedings of the IEEE, 2010.
simulated using 65 nm LL-HVT standard cells. The application                               [11] J. Rodrigues and et al., “Energy dissipation reduction of a cardiac event
                                                                                                detector in the sub-Vt domain by architectural folding,” PATMOS, 2009.
of a sub-VT energy model reveals that it is beneficial to use

Más contenido relacionado

La actualidad más candente

Extremely Low Power FIR Filter for a Smart Dust Sensor Module
Extremely Low Power FIR Filter for a Smart Dust Sensor ModuleExtremely Low Power FIR Filter for a Smart Dust Sensor Module
Extremely Low Power FIR Filter for a Smart Dust Sensor Module
CSCJournals
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD Editor
 
Cs cmos a low-noise logic family for mixed signal
Cs cmos a low-noise logic family for mixed signalCs cmos a low-noise logic family for mixed signal
Cs cmos a low-noise logic family for mixed signal
IAEME Publication
 

La actualidad más candente (15)

Bg4301324326
Bg4301324326Bg4301324326
Bg4301324326
 
DESIGN OF LOW POWER SAR ADC FOR ECG USING 45nm CMOS TECHNOLOGY
DESIGN OF LOW POWER SAR ADC FOR ECG USING 45nm CMOS TECHNOLOGYDESIGN OF LOW POWER SAR ADC FOR ECG USING 45nm CMOS TECHNOLOGY
DESIGN OF LOW POWER SAR ADC FOR ECG USING 45nm CMOS TECHNOLOGY
 
FTTCP: Fault Tolerant Two-level Clustering Protocol for WSN
FTTCP: Fault Tolerant Two-level Clustering Protocol for WSNFTTCP: Fault Tolerant Two-level Clustering Protocol for WSN
FTTCP: Fault Tolerant Two-level Clustering Protocol for WSN
 
Design of a Low Power and Area Efficient Flip Flop With Embedded Logic Module
Design of a Low Power and Area Efficient Flip Flop With Embedded Logic ModuleDesign of a Low Power and Area Efficient Flip Flop With Embedded Logic Module
Design of a Low Power and Area Efficient Flip Flop With Embedded Logic Module
 
Low power vlsi design ppt
Low power vlsi design pptLow power vlsi design ppt
Low power vlsi design ppt
 
wireless power transfer
wireless power transferwireless power transfer
wireless power transfer
 
IRJET- Proposing a RTD-Based Block for On-Chip GPU Caches to Reduce Static Po...
IRJET- Proposing a RTD-Based Block for On-Chip GPU Caches to Reduce Static Po...IRJET- Proposing a RTD-Based Block for On-Chip GPU Caches to Reduce Static Po...
IRJET- Proposing a RTD-Based Block for On-Chip GPU Caches to Reduce Static Po...
 
Extremely Low Power FIR Filter for a Smart Dust Sensor Module
Extremely Low Power FIR Filter for a Smart Dust Sensor ModuleExtremely Low Power FIR Filter for a Smart Dust Sensor Module
Extremely Low Power FIR Filter for a Smart Dust Sensor Module
 
IRJET- Modified SIMPLE Protocol for Wireless Body Area Networks
IRJET-  	  Modified SIMPLE Protocol for Wireless Body Area NetworksIRJET-  	  Modified SIMPLE Protocol for Wireless Body Area Networks
IRJET- Modified SIMPLE Protocol for Wireless Body Area Networks
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Reach and Operating Time Correction of Digital Distance Relay
Reach and Operating Time Correction of Digital Distance Relay Reach and Operating Time Correction of Digital Distance Relay
Reach and Operating Time Correction of Digital Distance Relay
 
PERFORMANCE ANALYSIS OF MODIFIED QSERL CIRCUIT
PERFORMANCE ANALYSIS OF MODIFIED QSERL CIRCUITPERFORMANCE ANALYSIS OF MODIFIED QSERL CIRCUIT
PERFORMANCE ANALYSIS OF MODIFIED QSERL CIRCUIT
 
Cs cmos a low-noise logic family for mixed signal
Cs cmos a low-noise logic family for mixed signalCs cmos a low-noise logic family for mixed signal
Cs cmos a low-noise logic family for mixed signal
 
A novel low power high dynamic threshold swing limited repeater insertion for...
A novel low power high dynamic threshold swing limited repeater insertion for...A novel low power high dynamic threshold swing limited repeater insertion for...
A novel low power high dynamic threshold swing limited repeater insertion for...
 
Ji3516041608
Ji3516041608Ji3516041608
Ji3516041608
 

Destacado (8)

49
4949
49
 
61
6161
61
 
62
6262
62
 
41
4141
41
 
52
5252
52
 
53
5353
53
 
94
9494
94
 
My profile
My profileMy profile
My profile
 

Similar a 55

RTB: BIDIRECTIONAL TRANSCEIVER (ESSCIRC85)
RTB: BIDIRECTIONAL TRANSCEIVER (ESSCIRC85)RTB: BIDIRECTIONAL TRANSCEIVER (ESSCIRC85)
RTB: BIDIRECTIONAL TRANSCEIVER (ESSCIRC85)
Piero Belforte
 
Implementation of Area Effective Carry Select Adders
Implementation of Area Effective Carry Select AddersImplementation of Area Effective Carry Select Adders
Implementation of Area Effective Carry Select Adders
Kumar Goud
 

Similar a 55 (20)

A Literature Review On Design Strategies And Methodologies Of Low Power VLSI ...
A Literature Review On Design Strategies And Methodologies Of Low Power VLSI ...A Literature Review On Design Strategies And Methodologies Of Low Power VLSI ...
A Literature Review On Design Strategies And Methodologies Of Low Power VLSI ...
 
Power Comparison of CMOS and Adiabatic Full Adder Circuits
Power Comparison of CMOS and Adiabatic Full Adder Circuits  Power Comparison of CMOS and Adiabatic Full Adder Circuits
Power Comparison of CMOS and Adiabatic Full Adder Circuits
 
POWER COMPARISON OF CMOS AND ADIABATIC FULL ADDER CIRCUITS
POWER COMPARISON OF CMOS AND ADIABATIC FULL ADDER CIRCUITSPOWER COMPARISON OF CMOS AND ADIABATIC FULL ADDER CIRCUITS
POWER COMPARISON OF CMOS AND ADIABATIC FULL ADDER CIRCUITS
 
Low Power Adiabatic Logic Design
Low Power Adiabatic Logic DesignLow Power Adiabatic Logic Design
Low Power Adiabatic Logic Design
 
Implementation and analysis of power reduction in 2 to 4 decoder design using...
Implementation and analysis of power reduction in 2 to 4 decoder design using...Implementation and analysis of power reduction in 2 to 4 decoder design using...
Implementation and analysis of power reduction in 2 to 4 decoder design using...
 
VOLTAGE STACKING FOR SIMPLIFYING POWER MANAGEMENT IN ASYNCHRONOUS CIRCUITS
VOLTAGE STACKING FOR SIMPLIFYING POWER MANAGEMENT IN ASYNCHRONOUS CIRCUITSVOLTAGE STACKING FOR SIMPLIFYING POWER MANAGEMENT IN ASYNCHRONOUS CIRCUITS
VOLTAGE STACKING FOR SIMPLIFYING POWER MANAGEMENT IN ASYNCHRONOUS CIRCUITS
 
VOLTAGE STACKING FOR SIMPLIFYING POWER MANAGEMENT IN ASYNCHRONOUS CIRCUITS
VOLTAGE STACKING FOR SIMPLIFYING POWER MANAGEMENT IN ASYNCHRONOUS CIRCUITSVOLTAGE STACKING FOR SIMPLIFYING POWER MANAGEMENT IN ASYNCHRONOUS CIRCUITS
VOLTAGE STACKING FOR SIMPLIFYING POWER MANAGEMENT IN ASYNCHRONOUS CIRCUITS
 
VOLTAGE STACKING FOR SIMPLIFYING POWER MANAGEMENT IN ASYNCHRONOUS CIRCUITS
VOLTAGE STACKING FOR SIMPLIFYING POWER MANAGEMENT IN ASYNCHRONOUS CIRCUITSVOLTAGE STACKING FOR SIMPLIFYING POWER MANAGEMENT IN ASYNCHRONOUS CIRCUITS
VOLTAGE STACKING FOR SIMPLIFYING POWER MANAGEMENT IN ASYNCHRONOUS CIRCUITS
 
RTB: BIDIRECTIONAL TRANSCEIVER (ESSCIRC85)
RTB: BIDIRECTIONAL TRANSCEIVER (ESSCIRC85)RTB: BIDIRECTIONAL TRANSCEIVER (ESSCIRC85)
RTB: BIDIRECTIONAL TRANSCEIVER (ESSCIRC85)
 
Bl34395398
Bl34395398Bl34395398
Bl34395398
 
Implementation of Area Effective Carry Select Adders
Implementation of Area Effective Carry Select AddersImplementation of Area Effective Carry Select Adders
Implementation of Area Effective Carry Select Adders
 
Optimized Design of an Alu Block Using Power Gating Technique
Optimized Design of an Alu Block Using Power Gating TechniqueOptimized Design of an Alu Block Using Power Gating Technique
Optimized Design of an Alu Block Using Power Gating Technique
 
POWER CONSUMPTION AT CIRCUIT OR LOGIC LEVEL IN CIRCUIT
POWER CONSUMPTION AT CIRCUIT OR LOGIC LEVEL IN CIRCUITPOWER CONSUMPTION AT CIRCUIT OR LOGIC LEVEL IN CIRCUIT
POWER CONSUMPTION AT CIRCUIT OR LOGIC LEVEL IN CIRCUIT
 
A new design of a microstrip rectenna at 5.8 GHz for wireless power transmiss...
A new design of a microstrip rectenna at 5.8 GHz for wireless power transmiss...A new design of a microstrip rectenna at 5.8 GHz for wireless power transmiss...
A new design of a microstrip rectenna at 5.8 GHz for wireless power transmiss...
 
A NOVEL LOW POWER HIGH DYNAMIC THRESHOLD SWING LIMITED REPEATER INSERTION FOR...
A NOVEL LOW POWER HIGH DYNAMIC THRESHOLD SWING LIMITED REPEATER INSERTION FOR...A NOVEL LOW POWER HIGH DYNAMIC THRESHOLD SWING LIMITED REPEATER INSERTION FOR...
A NOVEL LOW POWER HIGH DYNAMIC THRESHOLD SWING LIMITED REPEATER INSERTION FOR...
 
LEAKAGE POWER REDUCTION AND ANALYSIS OF CMOS SEQUENTIAL CIRCUITS
LEAKAGE POWER REDUCTION AND ANALYSIS OF CMOS SEQUENTIAL CIRCUITSLEAKAGE POWER REDUCTION AND ANALYSIS OF CMOS SEQUENTIAL CIRCUITS
LEAKAGE POWER REDUCTION AND ANALYSIS OF CMOS SEQUENTIAL CIRCUITS
 
Energy efficient and high speed domino logic circuits
Energy efficient and high speed domino logic circuitsEnergy efficient and high speed domino logic circuits
Energy efficient and high speed domino logic circuits
 
Sub-Threshold Leakage Current Reduction Techniques In VLSI Circuits -A Survey
Sub-Threshold Leakage Current Reduction Techniques In VLSI Circuits -A SurveySub-Threshold Leakage Current Reduction Techniques In VLSI Circuits -A Survey
Sub-Threshold Leakage Current Reduction Techniques In VLSI Circuits -A Survey
 
FORCED STACK SLEEP TRANSISTOR (FORTRAN): A NEW LEAKAGE CURRENT REDUCTION APPR...
FORCED STACK SLEEP TRANSISTOR (FORTRAN): A NEW LEAKAGE CURRENT REDUCTION APPR...FORCED STACK SLEEP TRANSISTOR (FORTRAN): A NEW LEAKAGE CURRENT REDUCTION APPR...
FORCED STACK SLEEP TRANSISTOR (FORTRAN): A NEW LEAKAGE CURRENT REDUCTION APPR...
 
Comparative Performance Analysis of XORXNOR Function Based High-Speed CMOS Fu...
Comparative Performance Analysis of XORXNOR Function Based High-Speed CMOS Fu...Comparative Performance Analysis of XORXNOR Function Based High-Speed CMOS Fu...
Comparative Performance Analysis of XORXNOR Function Based High-Speed CMOS Fu...
 

Más de srimoorthi (20)

87
8787
87
 
84
8484
84
 
83
8383
83
 
82
8282
82
 
75
7575
75
 
73
7373
73
 
72
7272
72
 
70
7070
70
 
69
6969
69
 
68
6868
68
 
63
6363
63
 
60
6060
60
 
59
5959
59
 
57
5757
57
 
56
5656
56
 
50
5050
50
 
51
5151
51
 
45
4545
45
 
44
4444
44
 
43
4343
43
 

55

  • 1. Ultra Low Energy vs Throughput Design Exploration of 65 nm Sub-VT CMOS Digital Filters S. M. Yasser Sherazi, Joachim N. Rodrigues, Omer C. Akgun, Henrik Sjöland, and Peter Nilsson Department of Electrical and Information Technology, Lund University Box 118, SE-221 00 Lund, Sweden Email: {yasser.sherazi, joachim.rodrigues, omercan.akgun, henrik.sjoland, peter.nilsson}@eit.lth.se Abstract—This paper presents an analysis on energy dissipa- tion of a digital half band filters operated in the the sub-threshold (sub-VT ) region with throughput constraints. The degradation of speed in the sub-VT domain is counteracted by unfolding the architectures. A filter is implemented in a basic 12-bit and its various unfolded structures. The designs are synthesized in a 65 nm low-leakage high-threshold CMOS technology. A sub- Fig. 1. Receiver system. VT energy model is applied to characterize the designs in the sub-VT domain. The results from application of an energy 250 Ksamples/s. Therefore, a chain of decimation filters needs model shows that the unfolded by 2 architecture is most energy to be applied. To achieve lower energy dissipation, we are efficient, dissipating 22 % less energy compared to it the original employing voltage scaling techniques rigorously, hence mak- filter implementation at energy minimum voltage. Unfolded by ing the designed circuits run in the sub-threshold (sub-VT ) 4 architecture, however, is the best for throughput requirements of around 120 Ksamples/sec to 1 Msamples/s, as it dissipates less domain [1]. When operating in the sub-VT domain, leakage energy than any other implementation in this speed range. currents are to be dealt with, which are the source of energy dissipation in idle CMOS [2]. This current puts an important I. I NTRODUCTION design constraint especially in implantable medical devices. Miniaturized devices are important in medicine, sensor Consequently, we need to optimize the circuits in terms of networks, and many other applications. Engineers aim to energy dissipation and throughput for sub-VT operation. develop ultra compact and low energy circuits that may be In Sec. II we briefly present the applied sub-VT energy used in devices like hearing aids, medical implants, and remote model. In Sec. III we present a 12-bit architecture of a sensors. There is currently a major interest in small wireless Half Band Digital (HBD) filter that is implemented as direct devices with ultra low energy dissipation targeting on-body mapped and its various unfolded structures. In Sec. IV the applications or medical implants. In such devices minimal results attained from the HBD filters are shown and discussed, energy dissipation in active and standby mode, is of highest and finally, the conclusions are presented in Sec. V. importance as it makes the battery last longer, which is II. S UB -VT E NERGY M ODEL important as it is non-trivial to change or charge a battery in a medical implant. Devices like hearing aids that communicate The current of a MOS transistor is not equal drop to zero between the two ears to improve binaural hearing may benefit when the gate to source voltage VGS is equal to or below from energy efficient wireless receivers. Another example is a the threshold voltage VT , VGS ≤ VT , which is an indication neural sensor inside the body that communicates with a robotic for leakage currents, commonly referred to as the sub-VT or arm or leg. If a radio is made sufficiently small and with weak inversion conduction [3]. The existent current is due minimal power consumption, there will be vast possibilities to leakage and low in amperage, and in the sub-VT domain for new applications. used as the operating switching current. The drawback of sub- In the conducted project the design constraints are, less VT circuits is speed penalty. However, circuits that operate at than 1 mW and 1 µW power consumption in active and sub-VT manage to satisfy the ultra low energy requirements, standby mode, respectively, capacity to handle data rates up since order of magnitudes less energy is dissipated compared to 250 kbits/s, and realization on a single chip with an area of to super-threshold circuits [3]. The total energy dissipation of 1 mm2 in 65 nm CMOS. A block digram shows the receiver static CMOS digital circuits typically modelled as system in Fig. 1, containing a RF front-end (2.5 GHz), an Etotal = αCtot VDD 2 + Ileak VDD Tclk + Ipeak tsc VDD , (1) analog-to-digital converter, a digital baseband for demodula- Edyn Eleak Esc tion and control, and finally, an analog decoder that processes the received data packets. where Edyn is the average switching energy and Eleak is The main focus of this paper is on the digital baseband leakage energy dissipated during a clock cycle Tclk . As it part of the receiver system. The first task of the digital is known that the energy dissipation due to short circuit baseband circuit is to re-sample data from 4 Msamples/s to (Esc ) in the sub-VT domain is minor compared to the overall 978-1-4244-8971-8/10$26.00 c 2010 IEEE
  • 2. (a) (a) (b) Fig. 2. Half Band Digital Filter. (a) single HBD filter (b) uf-2 HBD filter. energy dissipation, which therefore is neglected [1]. In (1), Edyn during one clock period is proportional to the switching activity factor (α), and the total switched capacitance of the circuit (Ctot ). The model used to calculate energy dissipation delivers SPICE-accurate results [4]. This model calculates total energy dissipation by (2), and the key parameters required are ob- tained during synthesis and high level simulations. ET = Cinv VDD µe kcap + kcrit kleak e−VDD /(nUt ) , 2 (2) where kleak is average leakage scaling factor of the circuit is normalized to the average leakage current of a single inverter. The scaling factor kcap is the normalized total capacitance of the circuit in terms of a single inverter capacitance. The kcrit is a coefficient that measures the critical path delay of (b) the circuit in terms of a single inverter delays. The average Fig. 3. Unfolded Architectures of the HBD filter. (a) uf-4 HBD filter (b) switching activity of circuit per N samples operations is µe . uf-8 HBD filter. A process dependent constant called slope factor is n, and Ut is the thermal voltage and its value is 26 mV at 300 K. For All the filter coefficients are 1 or 2 may be implemented more details the reader is referred to [4]. by simple shifting, and thereby saving the area and energy dissipation of the circuit. An initial analysis indicates that III. F ILTER A RCHITECTURES the required throughput would not be achieved by a single Minimum energy dissipation with medium to high through- sample implementation of this filter. Therefore, unfolding was put requirement puts stringent constraints on a design. There- applied. Unfolding is a transformation technique that calculate fore, it is important to explore and analyse the architectures j samples per clock cycle, where j is the unfolding factor. that best fulfill the requirements. This section presents the Unfolding has a property of preserving the number of delays HBD filter and the architectural differences in the basic and in a Direct Form Graph (DFG) [7]. The basic HBD filter unfolded versions. architecture was unfolded to get three more structures, i.e., unfolded by 2 (uf-2), unfolded by 4 (uf-4) and, unfolded A. Half Band Digital Filter by 8 (uf-8). In all unfolded architectures the number of An optimized third order filter structure is evaluated for registers remain unchanged, whereas the adders scale with the minimum energy dissipation. The filter structure for the par- unfolding factor. Fig. 2(b), shows the uf-2 version of the filter. allel implementation, see Fig. 2(a), is a parallel third-order Furthermore, the critical path of this circuit is equal to the bi-reciprocal lattice wave digital filter, [5], considered as original HBD filter structure. Fig. 3(a) shows an architecture highly suitable as decimator or interpolator, for sample rate that was unfolded by a factor of 4. The number of adders has conversions with a factor of two. The benefit of using this type increased according to the unfolding factor. The critical path of filter is that all filtering may be performed at lower sample has increased, since two of the feedback paths do not contain rates, with low arithmetic complexity, therefore, yielding both a register. Similarly, Fig. 3(b), shows the architecture of uf-8 low energy dissipation and a low chip area [6]. The transfer HBD, the adders have increased by a factor of 8, compared function of the proposed filter is to the original HBD structure. The critical path increases, 1 + 2z −1 + 2z −2 + z −3 since six of the feed back paths do not contain any register. Hz = , (3) However, there are more samples processed per clock cycle in 2 + z −2
  • 3. TABLE I E XTRACTED PARAMETER FOR THE S YNTHESIZED I MPLEMENTATIONS Energy dissipation is calculated under the assumption that Arch. kleak kcap kcrit µe Area tp [nsec] the designs operate at critical path speed, which gives an En- par 1113.6 835.4 127.4 0.727 1124 2.84 ergy Minimum Voltage (EMV) point [9]. The threshold voltage uf-2 1695.5 1375.7 127.4 0.708 1836 2.84 for this LL-HVT device is around 430 mV. The designs’ energy uf-4 3172.5 2797.9 164.2 0.703 3275 3.66 characteristics, over a scaled supply voltage VDD per clock uf-8 5924.5 5422.3 232.2 0.890 6170 5.22 cycle is presented in Fig. 4(a). It is shown that the basic HBD filter implementation denoted by (par) dissipates the TABLE II minimum amount of energy per clock cycle when compared C HARACTERIZATION OF THE I MPLEMENTATIONS AT EMV with the other three implementations. The reason being that Arch. EMV Freq. Throughput E/Cyc E/smp the leakage for this circuit is less than that of the other circuits [mV] [kHz] [ksamples/s] [fJ] [fJ] thanks to less area. The energy minima (per clock cycle) of par 241 23.6 23.6 45 45 45.5 fJ for par implementation is achieved around 241 mV uf-2 238 23.6 47.2 71 35 (indicated by the dot), which is lower than EMV of any other uf-4 247 22.0 88.0 150 38 architecture, which confirms that lesser area contributes to less uf-8 251 15.4 123.4 380 48 energy per clock cycle. However, it is crucial to investigate TABLE III the energy spent on the processing of each sample of data, P ERFORMANCES OF THE I MPLEMENTATIONS AT R EQUIRED and the apparent benefit of using par structure is lost when T HROUGHPUTS the energy per operation or energy per sample is considered. Throughput Circuits Vdd V [mV] E/Cyc [fJ] E/smp [fJ] Fig. 4(b), shows the energy dissipation per sample for different 2 Msamples/s uf-8 390 656 82.2 structures. Reason being that unfolded circuits perform twice, 1 Msamples/s uf-8 368 586 73.3 four and eight times as much operations per clock cycle, uf-4 376 246 61.5 therefore the over all energy per sample for these circuits is uf-2 400 136 68.3 reduced when compared to a single sample implementation. 500 Ksamples/s uf-8 344 525 65.2 Fig. 4(b), shows that the most efficient architecture is uf-2 as it uf-4 352 226 54.7 dissipates 35.8 fJ per sample which is 45 % less than the energy uf-2 368 116 58.4 dissipated by the par structure. Here, we may observe that par 400 85.2 85.2 the uf-8 architecture is less energy efficient than par, even in 250 Ksamples/s uf-8 300 434 55.0 energy dissipation per sample at lower voltages and is almost uf-4 320 188 47.0 equal to par, near the threshold voltages. The reason for this uf-2 344 126 51.8 behaviour is that the uf-8 has higher switching activity µe . The par 368 72.9 72.9 maximum frequency attainable with respect to VDD is shown the unfolded structures, which wins with respect to throughput in Fig 4(c), the maximum frequency for both par and uf-2, is over a limited increase in the critical path [8]. always higher than their counterparts due to a shorter critical path, and the uf-8 has the slowest maximum speed because of B. Hardware Mapping longer critical path, see Table I. Fig 4(d), shows the energy dissipation of all the structures with respect to throughput. All the cells used for implementation are from a low-leakage Table II, presents the characteristics of all the presented high-threshold (LL-HVT) standard cell library. Tight synthesis architectures at EMV, showing the maximum frequencies constraints were set to get minimum area and a short critical attainable, the corresponding throughputs, energy dissipated path. The parameters for the energy model were retrieved by per clock cycle, as well as per sample. These simulations show gate-level simulations with back annotated toggle and timing that we benefit from unfolding technique, both in energy per information, which includes glitches. The parameters obtained sample and in throughput. were applied to the energy model to characterize the designs In the project discussed in Sec. I, we need a chain of four in the sub-VT domain. HBD filters, that reduces the high frequency data with the rate of 4 Msamples/s from the ADC to the actual data rate of IV. S IMULATION R ESULT 250 Ksamples/s. The first HBD filter must process the input In this section the architectures of the filter are evaluated data stream with the rate of 2 Msamples/s. This throughput with respect to energy and throughput. The parameters re- requirement is only fulfilled by using uf-8 HBD near 390 mV, quired for the energy model [4], extracted during synthesis as shown in Table III and Fig. 4(d). The throughput require- and energy simulations, discussed in II, are presented in ment of data with the rate of 1 Msamples/s for the second Table I. The values for kleak follow the area cost, indicating HBD is fulfilled by using any three of the unfolded structure, proportional leakage with respect to area. The k parameters uf-8, uf-4 and uf-2. The throughput requirement of data with for the unfolded implementations are not proportional to the the rate of 500 Ksamples/s for third HBD is fulfilled by all unfolding factor j since the number of internal registers remain four structures as shown in Table III and Fig. 4(d). The unchanged from the basic implementation, although there is throughput requirement of data with the rate of 250 Ksamples/s an increase in the number of input and output registers. for last HBD is again fulfilled by all structures. In Fig. 4(b),
  • 4. 3 10 2 10 uf-8 Energy/samp [fJ] Energy [fJ] uf-4 10 2 uf-8 uf-2 par uf-4 par uf-2 0.15 0.2 0.25 0.3 0.35 0.4 0.15 0.2 0.25 0.3 0.35 0.4 VDD [V] VDD [V] (a) (b) 3 10 90 2 10 80 Energy/samp [fJ] fmax [kHz] uf-2 uf-8 70 1 10 par uf-4 par 60 0 10 50 uf-8 40 uf-4 −1 uf-2 10 0.1 0.15 0.2 0.25 0.3 0.35 0.4 1k 10k 100k 1M V [V] Throughput [samples] DD (c) (d) Fig. 4. Simulation Plots of HBD filter architectures, (a) Energy vs VDD per clock cycle, (b) Energy vs VDD per sample. (c) Frequency vs VDD , (d) Energy vs Throughput the uf-2 structure appears to be the most energy efficient unfolded implementation to achieve low energy dissipation per circuit. However, when stringent throughput requirements are sample at EMV, when compared to the energy dissipated by in-place the uf-4 structure proves to be the best option as a basic basic HBD filter implementation. shown in Fig. 4(d) and Table III. This analysis shows that ACKNOWLEDGMENT its crucial to identify the most suitable architectures for the given throughput and energy requirements. Furthermore, in The authors would like to thank Swedish Foundation for [10] it is argued that low-leakage low-threshold cells are more Strategic Research (SSF) for funding the Wireless Communi- beneficial at higher throughput rates in sub-VT domain, which cation for Ultra Portable Devices projects at Lund University. needs to be further investigated for these filter implementation. R EFERENCES In [1] it was shown in that the supply voltage of sub-VT [1] E. Vittoz, Low-Power Electronics Design. CRC Press, 2004, ch. 16. circuits may be reduced down to 50 mV. However, in practical [2] P. van der Meer, Low-Power Deep Sub-Micron CMOS Logic. Kluwer terms at such low voltage values functional failures frequently Academic Publishers, 2006. occur due to the process variations. It was found in [11] that [3] H. Soeleman and et al., “Robust subthreshold logic for ultra-low power operation,” IEEE T-VLSI Systems, vol. 9, pp. 90–99, Feb 2001. the supply voltage value which realizes operation with less [4] O. C. Akgun and Y. Leblebici, “Energy efficiency comparison of than 0.001 failure rate for a 65 nm LL-HVT process is 250 mV asynchronous and synchronous circuits operating in the sub-threshold and this value is taken as the minimum reliable operating regime,” Journal of Low Power Electronics, vol. 4, OCT 2008. [5] P. Nilsson and M. Torkelson, “Method to save silicon area by increasing voltage (ROV), indicated in the Fig. 4(b) by a line at 250 mV. the filter order,” in Electronic letters. ACM, NY, USA, 1995. The simulations show that for the required throughput we are [6] H. Ohlsson and et al., “Arithmetic transformations for increased maximal operating safely above ROV, see Table III. sample rate of bit-parallel bireciprocal lattice wave digital filters,” in ISCAS, 2001. V. C ONCLUSION [7] K. K. Parhi, VLSI Digital Signal Processing Systems, 1999, ch. 5. [8] P. Åstrom, P. Nilsson, and et al., “Power reduction in custom CMOS In this paper four HBD filter structures are evaluated for digital filter structures,” AICSP Journal, vol. 18, pp. 97–105, 1998. minimum energy dissipation in the sub-VT domain for a [9] J. Rodrigues and et al., “A <1 pJ Sub-VT cardiac event detector in 65 nm LL-HVT CMOS,” VLSI-SOC, 2010. throughput constrained system. All architectures i.e., the un- [10] D. Markovic, J.M.Rabaey, and et al., “Ultralow-power design in near- folded by 2,4,8 and the basic HBD filter, are implemented and threshold region,” Proceedings of the IEEE, 2010. simulated using 65 nm LL-HVT standard cells. The application [11] J. Rodrigues and et al., “Energy dissipation reduction of a cardiac event detector in the sub-Vt domain by architectural folding,” PATMOS, 2009. of a sub-VT energy model reveals that it is beneficial to use