SlideShare una empresa de Scribd logo
1 de 5
Flexible DSP Accelerator Architecture Exploiting
Carry-Save Arithmetic
Abstract:
Hardware acceleration has been proved an extremelypromising implementation strategy for the
digital signal processing (DSP)domain. Rather than adopting a monolithic application-specific
integratedcircuit design approach, in this brief, we present a novel acceleratorarchitecture
comprising flexible computational units that support theexecution of a large set of operation
templates found in DSP kernels.We differentiate from previous works on flexible accelerators by
enablingcomputations to be aggressively performed with carry-save (CS) formatteddata.
Advanced arithmetic design concepts, i.e., recoding techniques,are utilized enabling CS
optimizations to be performed in a larger scopethan in previous approaches.The proposed
architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2.
Enhancement of the project:
Perform the other temple of the FCU.
Existing system:
Modern embedded systems target high-end application domainsrequiring efficient
implementations of computationally intensivedigital signal processing (DSP) functions. The
incorporation ofheterogeneity through specialized hardware accelerators improvesperformance
and reduces energy consumption. Althoughapplication-specific integrated circuits (ASICs) form
the ideal accelerationsolution in terms of performance and power, their inflexibilityleads to
increased silicon complexity, as multiple instantiated ASICsare needed to accelerate various
kernels. Many researchers haveproposed the use of domain-specific coarse-grained
reconfigurable accelerators in order to increase ASICs’ flexibility withoutsignificantly
compromising their performance.
The aforementioned reconfigurable architectures excludearithmetic optimizations during the
architectural synthesis andconsider them only at the internal circuit structure of
primitivecomponents, e.g., adders, during the logic synthesis. However,research activities have
shown that the arithmeticoptimizations at higher abstraction levels than the structuralcircuit one
significantly impact on the datapath performance. In, timing-driven optimizations based on
carry-save (CS) arithmetic were performed at the post-Register Transfer Level (RTL) design
stage. In, common subexpression eliminationin CS computations is used to optimize linear DSP
circuits. Verma et al. developed transformation techniques on theapplication’s DFG to maximize
the use of CS arithmetic prior theactual datapath synthesis. The aforementioned CS
optimizationapproaches target inflexible datapath, i.e., ASIC, implementations. Recently, Xydis
et al. proposed a flexible architecturecombining the ILP and pipelining techniques with the CS-
awareoperation chaining. However, the entire aforementioned solutions featurean inherent
limitation, i.e., CS optimization is bounded to mergingonly additions/subtractions. A CS to
binary conversion is insertedbefore each operation that differs from addition/subtraction,
e.g.,multiplication, thus, allocating multiple CS to binary conversionsthat heavily degrades
performance due to time-consuming carrypropagations.
Disadvantages:
 high the area
 high the power
Proposed system:
The proposed flexible accelerator architecture is shown in Fig. 1.Each FCU operates directly on
CS operands and produces data inthe same form1 for direct reuse of intermediate results. Each
FCU operates on 16-bit operands. Such a bit-length is adequate for themost DSP datapaths, but
the architectural concept of the FCUcan be straightforwardly adapted for smaller or larger bit-
lengths.The number of FCUs is determined at design time based on theILP and area constraints
imposed by the designer. The CStoBinmodule is a ripple-carry adder and converts the CS form
to the two’scomplement one. The register bank consists of scratch registers andis used for storing
intermediate results and sharing operands amongthe FCUs. Different DSP kernels (i.e., different
register allocationand data communication patterns per kernel) can be mapped ontothe proposed
architecture using post-RTL datapath interconnectionsharing techniques. The control unit drives
the overallarchitecture (i.e., communication between the data port and theregister bank,
configuration words of the FCUs and selection signalsfor the multiplexers) in each clock cycle.
Structure of the Proposed Flexible Computational Unit:
The structure of the FCU (Fig. 2) has been designed to enablehigh-performance flexible
operation chaining based on a library of operation templates. Each FCU can be configured to
anyof the T1–T5 operation templates shown in Fig. 3.
Figure 1 : Abstract form of the flexible datapath.
The proposedFCU enables intra-template operation chaining by fusing the additionsperformed
before/after the multiplication and performs any partialoperation template of the following
complex operations:
W∗ = A × (X∗ + Y∗) + K∗ (1)
W∗ = A × K∗ + (X∗ + Y ∗). (2)
Figure 2 : FCU.
The following relation holds for all CS data: X∗ = {XC, XS} =XC + XS. The operand A is a two’s
complement number. Thealternative execution paths in each FCU are specified after
properlysetting the control signals of the multiplexers MUX1 and MUX2 (Fig. 2). The
multiplexer MUX0 outputs Y ∗ when CL0 = 0(i.e., X∗ + Y ∗ is carried out) or Y ∗ when X∗ − Y
∗ is requiredand CL0 = 1. The two’s complement 4:2 CS adder produces theN∗ = X∗ +Y ∗ when
the input carry equals 0 or the N∗ = X∗ −Y ∗when the input carry equals 1. The MUX1
determines if N∗ (1) orK∗ (2) is multiplied with A. TheMUX2 specifies if K∗ (1) or N∗ (2)is
added with the multiplication product. The multiplexer MUX3accepts the output of MUX2 and
its 1’s complement and outputsthe former one when an addition with the multiplication product
isrequired (i.e., CL3 = 0) or the later one when a subtraction is carriedout (i.e., CL3 = 1). The 1-
bit ace for the subtraction is added in theCS adder tree.
Figure 3 : FCU template library.
The multiplier comprises a CS-to-MB module, which adopts arecently proposed techniqueto
recode the 17-bit P∗ in itsrespective MB digits with minimal carry propagation. The
multiplier’sproduct consists of 17 bits. The multiplier includes a compensationmethod for
reducing the error imposed at the product’s accuracy bythe truncation technique. However, since
all the FCU inputsconsist of 16 bits and provided that there are no overflows, the16 most
significant bits of the 17-bit W∗ (i.e., the output of theCarry-Save Adder (CSA) tree, and thus, of
the FCU) are inserted inthe appropriate FCU when requested.
Advantages:
 high degrees of computational density
 reduce the area
 reduce the power
Software implementation:
 Modelsim
 Xilinx ISE

Más contenido relacionado

La actualidad más candente

Low power & area efficient carry select adder
Low power & area efficient carry select adderLow power & area efficient carry select adder
Low power & area efficient carry select adder
Sai Vara Prasad P
 
Implementation of Area Effective Carry Select Adders
Implementation of Area Effective Carry Select AddersImplementation of Area Effective Carry Select Adders
Implementation of Area Effective Carry Select Adders
Kumar Goud
 

La actualidad más candente (20)

DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMDUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
 
A comparative study of different multiplier designs
A comparative study of different multiplier designsA comparative study of different multiplier designs
A comparative study of different multiplier designs
 
Low power & area efficient carry select adder
Low power & area efficient carry select adderLow power & area efficient carry select adder
Low power & area efficient carry select adder
 
Implementation of Low Power and Area Efficient Carry Select Adder
Implementation of Low Power and Area Efficient Carry Select AdderImplementation of Low Power and Area Efficient Carry Select Adder
Implementation of Low Power and Area Efficient Carry Select Adder
 
Modified montgomery modular multiplier for cryptosystems
Modified montgomery modular multiplier for cryptosystemsModified montgomery modular multiplier for cryptosystems
Modified montgomery modular multiplier for cryptosystems
 
FPGA Implementation of High Speed Architecture of CSLA using D-Latches
FPGA Implementation of High Speed Architecture of CSLA using D-LatchesFPGA Implementation of High Speed Architecture of CSLA using D-Latches
FPGA Implementation of High Speed Architecture of CSLA using D-Latches
 
Implementation of Area Effective Carry Select Adders
Implementation of Area Effective Carry Select AddersImplementation of Area Effective Carry Select Adders
Implementation of Area Effective Carry Select Adders
 
Array multiplier
Array multiplierArray multiplier
Array multiplier
 
High Performance MAC Unit for FFT Implementation
High Performance MAC Unit for FFT Implementation High Performance MAC Unit for FFT Implementation
High Performance MAC Unit for FFT Implementation
 
Aw4102359364
Aw4102359364Aw4102359364
Aw4102359364
 
Design and Verification of Area Efficient Carry Select Adder
Design and Verification of Area Efficient Carry Select AdderDesign and Verification of Area Efficient Carry Select Adder
Design and Verification of Area Efficient Carry Select Adder
 
Efficient Design of Ripple Carry Adder and Carry Skip Adder with Low Quantum ...
Efficient Design of Ripple Carry Adder and Carry Skip Adder with Low Quantum ...Efficient Design of Ripple Carry Adder and Carry Skip Adder with Low Quantum ...
Efficient Design of Ripple Carry Adder and Carry Skip Adder with Low Quantum ...
 
05725150
0572515005725150
05725150
 
Low cost high-performance vlsi architecture for montgomery modular multiplica...
Low cost high-performance vlsi architecture for montgomery modular multiplica...Low cost high-performance vlsi architecture for montgomery modular multiplica...
Low cost high-performance vlsi architecture for montgomery modular multiplica...
 
High Speed Carryselect Adder
High Speed Carryselect AdderHigh Speed Carryselect Adder
High Speed Carryselect Adder
 
Design and development of carry select adder
Design and development of carry select adderDesign and development of carry select adder
Design and development of carry select adder
 
Area–delay–power efficient carry select adder
Area–delay–power efficient carry select adderArea–delay–power efficient carry select adder
Area–delay–power efficient carry select adder
 
High speed and energy-efficient carry skip adder operating under a wide range...
High speed and energy-efficient carry skip adder operating under a wide range...High speed and energy-efficient carry skip adder operating under a wide range...
High speed and energy-efficient carry skip adder operating under a wide range...
 
Area Delay Power Efficient and Implementation of Modified Square-Root Carry S...
Area Delay Power Efficient and Implementation of Modified Square-Root Carry S...Area Delay Power Efficient and Implementation of Modified Square-Root Carry S...
Area Delay Power Efficient and Implementation of Modified Square-Root Carry S...
 
Design of Low Power Energy Efficient Carry Select Adder Using CMOS Technology
Design of Low Power Energy Efficient Carry Select Adder Using CMOS TechnologyDesign of Low Power Energy Efficient Carry Select Adder Using CMOS Technology
Design of Low Power Energy Efficient Carry Select Adder Using CMOS Technology
 

Destacado

Iaetsd mac using compressor based multiplier and carry save adder
Iaetsd mac using compressor based multiplier and carry save adderIaetsd mac using compressor based multiplier and carry save adder
Iaetsd mac using compressor based multiplier and carry save adder
Iaetsd Iaetsd
 
Design of CMOS Inverter for Low Power and High Speed using Mentor Graphics
Design of CMOS Inverter for Low Power and High Speed using Mentor GraphicsDesign of CMOS Inverter for Low Power and High Speed using Mentor Graphics
Design of CMOS Inverter for Low Power and High Speed using Mentor Graphics
IJEEE
 
Resume_Ramya_Purohit_
Resume_Ramya_Purohit_Resume_Ramya_Purohit_
Resume_Ramya_Purohit_
Ramya Purohit
 

Destacado (19)

Iaetsd mac using compressor based multiplier and carry save adder
Iaetsd mac using compressor based multiplier and carry save adderIaetsd mac using compressor based multiplier and carry save adder
Iaetsd mac using compressor based multiplier and carry save adder
 
Milano Fashion Week 2013
Milano Fashion Week 2013Milano Fashion Week 2013
Milano Fashion Week 2013
 
A survey on various technologies available for Smart lab based on Internet of...
A survey on various technologies available for Smart lab based on Internet of...A survey on various technologies available for Smart lab based on Internet of...
A survey on various technologies available for Smart lab based on Internet of...
 
Slide garlaschelli
Slide garlaschelliSlide garlaschelli
Slide garlaschelli
 
Design of Optimized FIR Filter Using FCSD Representation
Design  of  Optimized  FIR  Filter  Using  FCSD Representation Design  of  Optimized  FIR  Filter  Using  FCSD Representation
Design of Optimized FIR Filter Using FCSD Representation
 
Design of CMOS Inverter for Low Power and High Speed using Mentor Graphics
Design of CMOS Inverter for Low Power and High Speed using Mentor GraphicsDesign of CMOS Inverter for Low Power and High Speed using Mentor Graphics
Design of CMOS Inverter for Low Power and High Speed using Mentor Graphics
 
High Fault Coverage For On Chip Network Using Priority Based Routing Algorithm
High Fault Coverage For On Chip Network Using Priority Based Routing AlgorithmHigh Fault Coverage For On Chip Network Using Priority Based Routing Algorithm
High Fault Coverage For On Chip Network Using Priority Based Routing Algorithm
 
Resume_Ramya_Purohit_
Resume_Ramya_Purohit_Resume_Ramya_Purohit_
Resume_Ramya_Purohit_
 
Reversible code converter
Reversible code converterReversible code converter
Reversible code converter
 
Measuring calorie and nutrition from food image
Measuring calorie and nutrition from food imageMeasuring calorie and nutrition from food image
Measuring calorie and nutrition from food image
 
Understanding and building Your Own Docker
Understanding and building Your Own DockerUnderstanding and building Your Own Docker
Understanding and building Your Own Docker
 
Console Next Gen
Console Next GenConsole Next Gen
Console Next Gen
 
Live Container Migration: OpenStack Summit Barcelona 2016
Live Container Migration: OpenStack Summit Barcelona 2016Live Container Migration: OpenStack Summit Barcelona 2016
Live Container Migration: OpenStack Summit Barcelona 2016
 
SharePlus and launchpads Unity Connect Amsterdam
SharePlus and launchpads Unity Connect AmsterdamSharePlus and launchpads Unity Connect Amsterdam
SharePlus and launchpads Unity Connect Amsterdam
 
Cardiac rehabilitation past and present
Cardiac rehabilitation past and presentCardiac rehabilitation past and present
Cardiac rehabilitation past and present
 
Molecular dynamics simulations of ferroelectrics with feram code
Molecular dynamics simulations of ferroelectrics with feram codeMolecular dynamics simulations of ferroelectrics with feram code
Molecular dynamics simulations of ferroelectrics with feram code
 
Planificacion Estrategica
Planificacion EstrategicaPlanificacion Estrategica
Planificacion Estrategica
 
Carry look ahead adder
Carry look ahead adderCarry look ahead adder
Carry look ahead adder
 
design of high speed performance 64bit mac unit
design of high speed performance 64bit mac unitdesign of high speed performance 64bit mac unit
design of high speed performance 64bit mac unit
 

Similar a Flexible dsp accelerator architecture exploiting carry save arithmetic

Low cost high-performance vlsi architecture for montgomery modular multiplica...
Low cost high-performance vlsi architecture for montgomery modular multiplica...Low cost high-performance vlsi architecture for montgomery modular multiplica...
Low cost high-performance vlsi architecture for montgomery modular multiplica...
Ratnakar Varun
 
Paper id 25201467
Paper id 25201467Paper id 25201467
Paper id 25201467
IJRAT
 
Paper id 37201520
Paper id 37201520Paper id 37201520
Paper id 37201520
IJRAT
 
Implementation of Area & Power Optimized VLSI Circuits Using Logic Techniques
Implementation of Area & Power Optimized VLSI Circuits Using Logic TechniquesImplementation of Area & Power Optimized VLSI Circuits Using Logic Techniques
Implementation of Area & Power Optimized VLSI Circuits Using Logic Techniques
IOSRJVSP
 
Iaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformationIaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformation
Iaetsd Iaetsd
 

Similar a Flexible dsp accelerator architecture exploiting carry save arithmetic (20)

M367578
M367578M367578
M367578
 
Low cost high-performance vlsi architecture for montgomery modular multiplica...
Low cost high-performance vlsi architecture for montgomery modular multiplica...Low cost high-performance vlsi architecture for montgomery modular multiplica...
Low cost high-performance vlsi architecture for montgomery modular multiplica...
 
Field programmable gate array implementation of multiwavelet transform based...
Field programmable gate array implementation of multiwavelet  transform based...Field programmable gate array implementation of multiwavelet  transform based...
Field programmable gate array implementation of multiwavelet transform based...
 
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMDESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORM
 
A NOVEL CHAOS BASED MODULATION SCHEME (CS-QCSK) WITH IMPROVED BER PERFORMANCE
A NOVEL CHAOS BASED MODULATION SCHEME (CS-QCSK) WITH IMPROVED BER PERFORMANCEA NOVEL CHAOS BASED MODULATION SCHEME (CS-QCSK) WITH IMPROVED BER PERFORMANCE
A NOVEL CHAOS BASED MODULATION SCHEME (CS-QCSK) WITH IMPROVED BER PERFORMANCE
 
Paper id 25201467
Paper id 25201467Paper id 25201467
Paper id 25201467
 
Paper id 37201520
Paper id 37201520Paper id 37201520
Paper id 37201520
 
ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...
ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...
ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...
 
Implementation of Area & Power Optimized VLSI Circuits Using Logic Techniques
Implementation of Area & Power Optimized VLSI Circuits Using Logic TechniquesImplementation of Area & Power Optimized VLSI Circuits Using Logic Techniques
Implementation of Area & Power Optimized VLSI Circuits Using Logic Techniques
 
A Novel VLSI Architecture for FFT Utilizing Proposed 4:2 & 7:2 Compressor
A Novel VLSI Architecture for FFT Utilizing Proposed 4:2 & 7:2 CompressorA Novel VLSI Architecture for FFT Utilizing Proposed 4:2 & 7:2 Compressor
A Novel VLSI Architecture for FFT Utilizing Proposed 4:2 & 7:2 Compressor
 
Copy of colloquium 3 latest
Copy of  colloquium 3 latestCopy of  colloquium 3 latest
Copy of colloquium 3 latest
 
D0341015020
D0341015020D0341015020
D0341015020
 
Implementation of Low Power and Area-Efficient Carry Select Adder
Implementation of Low Power and Area-Efficient Carry Select AdderImplementation of Low Power and Area-Efficient Carry Select Adder
Implementation of Low Power and Area-Efficient Carry Select Adder
 
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT AlgorithmFPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
 
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT AlgorithmFPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
 
IIIRJET-Implementation of Image Compression Algorithm on FPGA
IIIRJET-Implementation of Image Compression Algorithm on FPGAIIIRJET-Implementation of Image Compression Algorithm on FPGA
IIIRJET-Implementation of Image Compression Algorithm on FPGA
 
Low power tool paper
Low power tool paperLow power tool paper
Low power tool paper
 
Design and Implementation of an Embedded System for Software Defined Radio
Design and Implementation of an Embedded System for Software Defined RadioDesign and Implementation of an Embedded System for Software Defined Radio
Design and Implementation of an Embedded System for Software Defined Radio
 
Iaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformationIaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformation
 
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
 

Más de Nexgen Technology

Más de Nexgen Technology (20)

MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...
     MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CH...     MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CH...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...
  MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENN...  MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENN...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...
 MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENNA... MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENNA...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...
 
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennai
Ieee 2020 21 vlsi projects in pondicherry,ieee  vlsi projects  in chennaiIeee 2020 21 vlsi projects in pondicherry,ieee  vlsi projects  in chennai
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennai
 
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
 
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
 
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
 
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
 
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
 
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
 
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
 
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...
Ieee 2020 21  embedded in pondicherry,final year projects in pondicherry,best...Ieee 2020 21  embedded in pondicherry,final year projects in pondicherry,best...
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...
 

Último

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 

Último (20)

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 

Flexible dsp accelerator architecture exploiting carry save arithmetic

  • 1. Flexible DSP Accelerator Architecture Exploiting Carry-Save Arithmetic Abstract: Hardware acceleration has been proved an extremelypromising implementation strategy for the digital signal processing (DSP)domain. Rather than adopting a monolithic application-specific integratedcircuit design approach, in this brief, we present a novel acceleratorarchitecture comprising flexible computational units that support theexecution of a large set of operation templates found in DSP kernels.We differentiate from previous works on flexible accelerators by enablingcomputations to be aggressively performed with carry-save (CS) formatteddata. Advanced arithmetic design concepts, i.e., recoding techniques,are utilized enabling CS optimizations to be performed in a larger scopethan in previous approaches.The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2. Enhancement of the project: Perform the other temple of the FCU. Existing system: Modern embedded systems target high-end application domainsrequiring efficient implementations of computationally intensivedigital signal processing (DSP) functions. The incorporation ofheterogeneity through specialized hardware accelerators improvesperformance and reduces energy consumption. Althoughapplication-specific integrated circuits (ASICs) form the ideal accelerationsolution in terms of performance and power, their inflexibilityleads to increased silicon complexity, as multiple instantiated ASICsare needed to accelerate various kernels. Many researchers haveproposed the use of domain-specific coarse-grained reconfigurable accelerators in order to increase ASICs’ flexibility withoutsignificantly compromising their performance. The aforementioned reconfigurable architectures excludearithmetic optimizations during the architectural synthesis andconsider them only at the internal circuit structure of primitivecomponents, e.g., adders, during the logic synthesis. However,research activities have shown that the arithmeticoptimizations at higher abstraction levels than the structuralcircuit one significantly impact on the datapath performance. In, timing-driven optimizations based on carry-save (CS) arithmetic were performed at the post-Register Transfer Level (RTL) design stage. In, common subexpression eliminationin CS computations is used to optimize linear DSP circuits. Verma et al. developed transformation techniques on theapplication’s DFG to maximize the use of CS arithmetic prior theactual datapath synthesis. The aforementioned CS optimizationapproaches target inflexible datapath, i.e., ASIC, implementations. Recently, Xydis
  • 2. et al. proposed a flexible architecturecombining the ILP and pipelining techniques with the CS- awareoperation chaining. However, the entire aforementioned solutions featurean inherent limitation, i.e., CS optimization is bounded to mergingonly additions/subtractions. A CS to binary conversion is insertedbefore each operation that differs from addition/subtraction, e.g.,multiplication, thus, allocating multiple CS to binary conversionsthat heavily degrades performance due to time-consuming carrypropagations. Disadvantages:  high the area  high the power Proposed system: The proposed flexible accelerator architecture is shown in Fig. 1.Each FCU operates directly on CS operands and produces data inthe same form1 for direct reuse of intermediate results. Each FCU operates on 16-bit operands. Such a bit-length is adequate for themost DSP datapaths, but the architectural concept of the FCUcan be straightforwardly adapted for smaller or larger bit- lengths.The number of FCUs is determined at design time based on theILP and area constraints imposed by the designer. The CStoBinmodule is a ripple-carry adder and converts the CS form to the two’scomplement one. The register bank consists of scratch registers andis used for storing intermediate results and sharing operands amongthe FCUs. Different DSP kernels (i.e., different register allocationand data communication patterns per kernel) can be mapped ontothe proposed architecture using post-RTL datapath interconnectionsharing techniques. The control unit drives
  • 3. the overallarchitecture (i.e., communication between the data port and theregister bank, configuration words of the FCUs and selection signalsfor the multiplexers) in each clock cycle. Structure of the Proposed Flexible Computational Unit: The structure of the FCU (Fig. 2) has been designed to enablehigh-performance flexible operation chaining based on a library of operation templates. Each FCU can be configured to anyof the T1–T5 operation templates shown in Fig. 3. Figure 1 : Abstract form of the flexible datapath. The proposedFCU enables intra-template operation chaining by fusing the additionsperformed before/after the multiplication and performs any partialoperation template of the following complex operations: W∗ = A × (X∗ + Y∗) + K∗ (1) W∗ = A × K∗ + (X∗ + Y ∗). (2)
  • 4. Figure 2 : FCU. The following relation holds for all CS data: X∗ = {XC, XS} =XC + XS. The operand A is a two’s complement number. Thealternative execution paths in each FCU are specified after properlysetting the control signals of the multiplexers MUX1 and MUX2 (Fig. 2). The multiplexer MUX0 outputs Y ∗ when CL0 = 0(i.e., X∗ + Y ∗ is carried out) or Y ∗ when X∗ − Y ∗ is requiredand CL0 = 1. The two’s complement 4:2 CS adder produces theN∗ = X∗ +Y ∗ when the input carry equals 0 or the N∗ = X∗ −Y ∗when the input carry equals 1. The MUX1 determines if N∗ (1) orK∗ (2) is multiplied with A. TheMUX2 specifies if K∗ (1) or N∗ (2)is added with the multiplication product. The multiplexer MUX3accepts the output of MUX2 and its 1’s complement and outputsthe former one when an addition with the multiplication product isrequired (i.e., CL3 = 0) or the later one when a subtraction is carriedout (i.e., CL3 = 1). The 1- bit ace for the subtraction is added in theCS adder tree. Figure 3 : FCU template library.
  • 5. The multiplier comprises a CS-to-MB module, which adopts arecently proposed techniqueto recode the 17-bit P∗ in itsrespective MB digits with minimal carry propagation. The multiplier’sproduct consists of 17 bits. The multiplier includes a compensationmethod for reducing the error imposed at the product’s accuracy bythe truncation technique. However, since all the FCU inputsconsist of 16 bits and provided that there are no overflows, the16 most significant bits of the 17-bit W∗ (i.e., the output of theCarry-Save Adder (CSA) tree, and thus, of the FCU) are inserted inthe appropriate FCU when requested. Advantages:  high degrees of computational density  reduce the area  reduce the power Software implementation:  Modelsim  Xilinx ISE