SlideShare una empresa de Scribd logo
1 de 4
Descargar para leer sin conexión
Riya Saini, R.D.Daruwala / International Journal of Engineering Research and Applications
                 (IJERA)              ISSN: 2248-9622         www.ijera.com
                    Vol. 3, Issue 1, January -February 2013, pp.1676-1679

 Efficient Implementation of Pipelined Double Precision Floating
                        Point Multiplier
                                  Riya Saini*, R.D.Daruwala**
                                             M.Tech Student*
   *(Department of Electrical Engineering, Veermata Jijabai Technological Institute, Matunga, Mumbai-19)
                                            Professor & Dean**
  ** (Department of Electrical Engineering, Veermata Jijabai Technological Institute, Matunga, Mumbai-19)


ABSTRACT                                                     The design of floating-point applications for
Floating-point numbers are widely adopted in             FPGAs is much different. Due to the inherent
many      applications    due     their    dynamic       complexity of floating point arithmetic mapping
representation      capabilities.    Floating-point      difficulties occurred. With the introduction of high
representation is able to retain its resolution and      level languages such as VHDL, rapid prototyping of
accuracy compared to fixed-point representations.        floating point units has become possible. Elaborate
Unfortunately, floating point operators require          simulation and synthesis tools at a higher design level
excessive area (or time) for conventional                aid the designer for a more controllable and
implementations. The creation of floating point          maintainable product.
units under a collection of area, latency, and               The IEEE standard for binary floating-point
throughput     constraints     is  an     important      arithmetic [2] provides a detailed description of the
consideration for system designers. This paper           floating-point representation and the specifications
presents the implementation of a general purpose,        for the various floating-point operations. It also
scalable architecture used to synthesize floating        specifies the method to handle special cases and
point multipliers on FPGAs. Although several             exceptions.
implementations of floating point units targeted to         This paper describes the implementation of
FPGAs have been previously reported, most of             pipelining in design the floating-point multiplier
them are customized for specific applications. This      using VHDL and its synthesis for a Xilinx Virtex-II
paper presents a fully parallel floating-point           FPGA using Xilinx‟s Integrated Software
multiplier compliant with the IEEE 754 Standard          Environment (ISE) 12.1. Pipelining is one of the
for Floating-point Arithmetic. The design offers         popular methods to realize high performance
low latency and high throughput. A comparison            computing platform. Pipelining is a technique where
between our results and some previously reported         multiple instruction executions are overlapped. In the
implementations shows that our approach, in              top-down design approach, four arithmetic modules:
addition to the scalability feature, provides            addition, subtraction, multiplication, and division: are
multipliers with significant improvements in area        combined to form the floating-point ALU. The
and speed. We implemented our algorithms using           pipeline modules are independent of each other.
Xilinx ISE 12.1, with Xilinx Virtex-II Pro                   The organization of this paper is as follows:
XC2VP100 FPGA as our target device.                      Section 2 describes background on floating point unit
                                                         design. Section 3 describes algorithm for floating
Keywords – Floating Point Multiplier, FPGA,              point multiplication. Section 4 describes our
VHDL.                                                    approach to multiply floating numbers. The
                                                         simulation environment and results are briefed in
1. INTRODUCTION                                          Section 5 and concluding remarks are discussed
   Field-programmable gate arrays (FPGAs) have           Section 6.
long been attractive for accelerating fixed-point
applications. Early on, FPGAs could deliver tens of      2. BACKGROUND
narrow, low latency fixed-point operations. As           2.1. Floating Point Representation
FPGAs matured, the amount of parallelism to be               Standard floating point numbers are represented
exploited grew rapidly with FPGA size. This was a        using an
boon to many application designers as it enabled         exponent and a mantissa in the following format:
them to capture more of the application. It also meant          (sign bit) mantissa × baseexponent +bias
that the performance of FPGAs was growing faster             The mantissa is a binary, positive fixed-point
than that of CPUs [3].                                   value. Generally, the fixed point is located after the
                                                         first bit,m0, so that mantissa = {m0.m1m2...mn},
                                                         where mi is the ith bit of the mantissa. The floating
                                                         point number is “normalized” when m0 is one. The



                                                                                                1676 | P a g e
Riya Saini, R.D.Daruwala / International Journal of Engineering Research and Applications
                 (IJERA)              ISSN: 2248-9622         www.ijera.com
                    Vol. 3, Issue 1, January -February 2013, pp.1676-1679

exponent, combined with a bias, sets the range of
representable values. A common value for the bias is
−2k−1, where k is the bit-width of the exponent
[4].The double precision floating point format has an
11 bit exponent and 52 bit mantissa plus a sign bit.
This provides a wide dynamic range.




       Figure 1. 64-bit Floating Point Format

   The IEEE floating point standard makes floating
point unit implementation portable and the precision
of the results predictable. A variety of different
circuit structures can be applied to the same number
representations, offering flexibility.                             Figure 2. Floating Point Algorithm

2.2. Floating Point Implementations in FPGAs                 The two mantissas are to be multiplied, and the
    Several efforts to build floating point units using   two exponents are to be added. In order to perform
FPGAs have been made. These approaches have               floating-point multiplication, a simple algorithm is
generally explored                                        realized:
bit-width variation as a means to control precision. A    1. Add the exponents and subtract bias.
floating point library containing units with              2. Multiply the mantissas and determine the sign of
parameterized bitwidth was described in [6]. In this      the result.
library, mantissa and exponent bit-width can be           3. Normalize the resulting value, if necessary.
customized. The library includes a unit that can
convert between fixed point and floating point            4. PIPELINED   FLOATING                       POINT
numbers. The floating point units are arranged in         MULTIPLICATION MODULE
fixed pipeline stages.                                       Multiplication entity has three 64-bit inputs and
    Several researchers [7, 9, 10] have implemented       two 64-bit outputs. Selection input is used to enable
FPGA floating point adders and multipliers that meet      or disable the entity. Multiplication module is divided
IEEE754 floating point standards. Most commercial         into check zero, add exponent, multiply mantissa,
floating point libraries provide units that comply with   check sign, and normalize and concatenate all
the IEEE754 standard [7]. Luk [8] showed that in          modules, which are executed concurrently. Status
order to cover the same dynamic range, a fixed point      signal indicates special result cases such as overflow,
design must be five times larger and 40% slower than      underflow and result zero.
a corresponding floating point design. In contrast to        In this paper, pipelined floating point
earlier approaches, our floating point unit generation    multiplication is divided into three stages (Fig. 3).
tool automates floating point unit creation.

3. FLOATING POINT MULTIPLICATION
ALGORITHM
   The aim in developing a floating point multiplier
routine was to pipeline the unit in order to produce a
result every clock cycle. By pipelining the multiplier,
the speed increased, however, the area increased as
well. Different coding structures were tried in the
VHDL code used to program the Xilinx chips in
order to minimize size.

3.1 Algorithm
    The multiplier structure is organized as a three-
stage pipeline. This arrangement allows the system to
produce one result every clock cycle, after the first
three values are entered into the unit. Figure 2 shows
a flow chart of the multiplier structure.
                                                               Figure 3. Pipelined Floating Point Multiplier


                                                                                                1677 | P a g e
Riya Saini, R.D.Daruwala / International Journal of Engineering Research and Applications
                 (IJERA)              ISSN: 2248-9622         www.ijera.com
                    Vol. 3, Issue 1, January -February 2013, pp.1676-1679

 Stage 1 check whether the operand is 0 and report          output port. The final normalized product with the
the result accordingly. Stage 2 determines the product      correct biased exponent is concatenated with product
sign add exponents and multiply fractions. Stage3           sign.
normalize and concatenate the product.
                                                            5. RESULT AND ANALYSIS
4.1 Check Zero module:                                         Design is verified through simulation, which is
   Initially, two operands are checked to determine         done in a bottom-up fashion. The aim in designing
whether they contain a zero. If one of the operands is      the floating point units was to pipeline each unit a
zero, the zero_flag is set to 1. The output results zero,   sufficient number of times in order to maximize
which is passed through all the stages and outputted.       speed and to minimize area.
If neither of them are zero, then the inputs with              The simulation output is obtained by using Xilinx
IEEE754 format is unpacked and assigned to the              simulation tool is as follows.
check sign, add exponent and multiply mantissa
modules. The mantissa is packed with the hidden bit
„1‟.

4.2 Add exponent module:
   The module is activated if the zero flag is set.
Else, zero is passed to the next stage and exp_flag is
set to 0. Two extra bits are added to the exponent
indicating overflow and underflow. The resulting
sum has a double bias. So, the extra bias is subtracted
from the exponent sum. After this, the exp_flag is set
to 1.
                                                                   Figure 4. Synthesize result of multiplier
4.3 Multiply mantissa module:
   In this stage zero_flag is checked first. If the
zero_flag is set to 0, then no calculation and
normalization is performed. The mant_flag is set to 0.
If both operands are not zero, the operation is done
with multiplication operator. Mant_flag is set to 1 to
indicate that this operation is executed. It then
produces 46 bits where the lower order 32 bits of the
product are truncated.

4.4 Check sign module:
   This module determines the product sign of two
operands. The product is positive when the two
operands have the same sign; otherwise it is negative.            Figure 5. Simulation result of multiplier
The sign bits are compared using an XOR circuit.
The sign_flag is set to 1.                                  6. CONCLUSION
                                                               Pipeline floating point multiplier design using
4.5 Normalize and concatenate all modules:                  VHDL is successfully designed, implemented, and
   This module checks the overflow and underflow            tested. With the help of pipelined architecture i.e.
after adding the exponent. Underflow occurs if the          concurrent processing there will be less
9th bit is 1. Overflow occurs if the 8 bits is 1. If        combinational delay which means faster response and
exp_flag, sign_flag and mant_flag are set, then             better throughput with less latency as compared with
normalization is carried out. Otherwise, 32 zero bits       sequential processing but there will be a tradeoff
are assigned to the output.                                 between speed and chip area. Pipelined architecture
   During the normalization operation, the mantissa's       provide a faster response in floating point
MSB is 1. Hence, no normalization is needed. The            multiplication but also consumes more area i.e.
hidden bit is dropped and the remaining bit is packed       number of slices used on reconfigurable hardware are
and assigned to the output port. Normalization              more as compared with standard architecture.
module set the mantissa's MSB to 1. The current             Pipelining is used to decrease clock period. Using
mantissa is shifted left until a 1 is encountered. For      sequential processing there is larger latency but less
each shift the exponent is decreased by 1. Therefore,       number of slices are used on FPGAs as compared
if the mantissa's MSB is 1, normalization is                with pipelined architecture. Currently, we are
completed and first bit is the implied bit and dropped.     conducting further research that considers the further
The remaining bits are packed and assigned to the



                                                                                                  1678 | P a g e
Riya Saini, R.D.Daruwala / International Journal of Engineering Research and Applications
                 (IJERA)              ISSN: 2248-9622         www.ijera.com
                    Vol. 3, Issue 1, January -February 2013, pp.1676-1679

reductions in the hardware complexity in terms of        [12] Virtex-II Pro and Virtex-II Pro X Platform
synthesis.                                                    FPGAs: Complete Data Sheet. June 2005 (Rev
                                                              4.3), [cited Aug 2005].
REFERENCES
[1] M. de Lorimier and A. DeHon. Floating point
     sparsematrix-vector multiply for FPGAs. In
     Proceedings of the ACM International
     Symposium on Field Programmable Gate
     Arrays, Monterey, CA, February 2005.
[2] The Institute of Electrical and Electronic
     Engineers, Inc. IEEE Standard for Binary
     Floating-point Arithmetic. ANSI/IEEE Std 754-
     1985.
[3] K. D. Underwood. FPGAs vs. CPUs: Trends in
     peak floating-point performance. In Proceedings
     of the ACM International Symposium on Field
     Programmable Gate Arrays, Monterrey, CA,
     February 2004.
[4] I. Koren. Computer Arithmetic Algorithms.
     Brookside Court Publishers, Amherst, MA,
     1998.
[5] IEEE Standards Board. IEEE standard for binary
     floating-point arithmetic. Technical Report
     ANSI/IEEE Std. 754-1985, The Institute of
     Electrical and Electronic Engineers, New York,
     1985.
[6] P. Belanovi´c and M. Leeser. A Library of
     Parameterized Floating Point Modules and Their
     Use. In Proceedings, International Conference
     on Field Programmable Logic and Applications,
     Montpelier, France, Aug. 2002.
[7] B. Fagin and C. Renard. Field programmable
     gate arrays and floating point arithmetic. IEEE
     Transactions on VLSI Systems, 2(3):365–367,
     Sept. 1994.
[8] A. A. Gaffar, W. Luk, P. Y. Cheung, N. Shirazi,
     and J. Hwang. Automating Customization of
     Floating-point    Designs.    In    Proceedings,
     International     Conference       on      Field-
     Programmable       Logic    and     Applications,
     Montpelier, France, Aug. 2002.
[9] W. B. Ligon, S. McMillan, G. Monn, F. Stivers,
     and K. D. Underwood. A Re-evaluation of the
     Practicality of Floating-Point Operations on
     FPGAs. In Proceedings, IEEE Symposium on
     Field-Programmable        Custom      Computing
     Machines, pages 206–215, Napa, CA, Apr. 1998.
[10] L. Louca, W. H. Johnson, and T. A. Cook.
     Implementation of IEEE Single Precision
     Floating Point Addition and Multiplication on
     FPGAs. In Proceedings, IEEE Workshop on
     FPGAs for Custom Computing Machines, pages
     107–116, Napa, CA, Apr. 1996.
[11] E. M. Schwarz, M. Schmookler, and S. D.
     Trong. FPU implementations with denormalized
     numbers. IEEE Transactions on Computers,
     54(7), July 2005.




                                                                                         1679 | P a g e

Más contenido relacionado

La actualidad más candente

EGRE 310 RAMEYJM Final Project Writeup
EGRE 310 RAMEYJM Final Project WriteupEGRE 310 RAMEYJM Final Project Writeup
EGRE 310 RAMEYJM Final Project WriteupJacob Ramey
 
FPGA Implementation of Mixed Radix CORDIC FFT
FPGA Implementation of Mixed Radix CORDIC FFTFPGA Implementation of Mixed Radix CORDIC FFT
FPGA Implementation of Mixed Radix CORDIC FFTIJSRD
 
The performance of turbo codes for wireless communication systems
The performance of turbo codes for wireless communication systemsThe performance of turbo codes for wireless communication systems
The performance of turbo codes for wireless communication systemschakravarthy Gopi
 
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization andIaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization andIaetsd Iaetsd
 
Iaetsd finger print recognition by cordic algorithm and pipelined fft
Iaetsd finger print recognition by cordic algorithm and pipelined fftIaetsd finger print recognition by cordic algorithm and pipelined fft
Iaetsd finger print recognition by cordic algorithm and pipelined fftIaetsd Iaetsd
 
Design and Power Measurement of 2 And 8 Point FFT Using Radix-2 Algorithm for...
Design and Power Measurement of 2 And 8 Point FFT Using Radix-2 Algorithm for...Design and Power Measurement of 2 And 8 Point FFT Using Radix-2 Algorithm for...
Design and Power Measurement of 2 And 8 Point FFT Using Radix-2 Algorithm for...IOSRJVSP
 
IRJET- Review Paper on Study of Various Interleavers and their Significance
IRJET- Review Paper on Study of Various Interleavers and their SignificanceIRJET- Review Paper on Study of Various Interleavers and their Significance
IRJET- Review Paper on Study of Various Interleavers and their SignificanceIRJET Journal
 
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...IRJET Journal
 
High Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic MultiplierHigh Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic MultiplierIJERA Editor
 
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)Enrique Monzo Solves
 
IRJET- Synchronization Scheme of MIMO-OFDM using Monte Carlo Method
IRJET- Synchronization Scheme of MIMO-OFDM using Monte Carlo MethodIRJET- Synchronization Scheme of MIMO-OFDM using Monte Carlo Method
IRJET- Synchronization Scheme of MIMO-OFDM using Monte Carlo MethodIRJET Journal
 
Area, Delay and Power Comparison of Adder Topologies
Area, Delay and Power Comparison of Adder TopologiesArea, Delay and Power Comparison of Adder Topologies
Area, Delay and Power Comparison of Adder TopologiesVLSICS Design
 

La actualidad más candente (18)

EGRE 310 RAMEYJM Final Project Writeup
EGRE 310 RAMEYJM Final Project WriteupEGRE 310 RAMEYJM Final Project Writeup
EGRE 310 RAMEYJM Final Project Writeup
 
FPGA Implementation of Mixed Radix CORDIC FFT
FPGA Implementation of Mixed Radix CORDIC FFTFPGA Implementation of Mixed Radix CORDIC FFT
FPGA Implementation of Mixed Radix CORDIC FFT
 
The performance of turbo codes for wireless communication systems
The performance of turbo codes for wireless communication systemsThe performance of turbo codes for wireless communication systems
The performance of turbo codes for wireless communication systems
 
Er24902905
Er24902905Er24902905
Er24902905
 
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization andIaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization and
 
Iaetsd finger print recognition by cordic algorithm and pipelined fft
Iaetsd finger print recognition by cordic algorithm and pipelined fftIaetsd finger print recognition by cordic algorithm and pipelined fft
Iaetsd finger print recognition by cordic algorithm and pipelined fft
 
Design and Power Measurement of 2 And 8 Point FFT Using Radix-2 Algorithm for...
Design and Power Measurement of 2 And 8 Point FFT Using Radix-2 Algorithm for...Design and Power Measurement of 2 And 8 Point FFT Using Radix-2 Algorithm for...
Design and Power Measurement of 2 And 8 Point FFT Using Radix-2 Algorithm for...
 
241 250
241 250241 250
241 250
 
IRJET- Review Paper on Study of Various Interleavers and their Significance
IRJET- Review Paper on Study of Various Interleavers and their SignificanceIRJET- Review Paper on Study of Various Interleavers and their Significance
IRJET- Review Paper on Study of Various Interleavers and their Significance
 
Du31816820
Du31816820Du31816820
Du31816820
 
427 432
427 432427 432
427 432
 
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
 
High Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic MultiplierHigh Speed Area Efficient 8-point FFT using Vedic Multiplier
High Speed Area Efficient 8-point FFT using Vedic Multiplier
 
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
 
IRJET- Synchronization Scheme of MIMO-OFDM using Monte Carlo Method
IRJET- Synchronization Scheme of MIMO-OFDM using Monte Carlo MethodIRJET- Synchronization Scheme of MIMO-OFDM using Monte Carlo Method
IRJET- Synchronization Scheme of MIMO-OFDM using Monte Carlo Method
 
115 118
115 118115 118
115 118
 
Area, Delay and Power Comparison of Adder Topologies
Area, Delay and Power Comparison of Adder TopologiesArea, Delay and Power Comparison of Adder Topologies
Area, Delay and Power Comparison of Adder Topologies
 
Ba26343346
Ba26343346Ba26343346
Ba26343346
 

Destacado

Apresentação de Resultados 4T09
Apresentação de Resultados 4T09Apresentação de Resultados 4T09
Apresentação de Resultados 4T09Gafisa RI !
 
Apresentação 2T07
Apresentação 2T07Apresentação 2T07
Apresentação 2T07Gafisa RI !
 
Apresentação 3T07
Apresentação 3T07Apresentação 3T07
Apresentação 3T07Gafisa RI !
 
Apresentação 3T09
Apresentação 3T09Apresentação 3T09
Apresentação 3T09Gafisa RI !
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your BusinessBarry Feldman
 

Destacado (6)

Apresentação de Resultados 4T09
Apresentação de Resultados 4T09Apresentação de Resultados 4T09
Apresentação de Resultados 4T09
 
Apresentação 2T07
Apresentação 2T07Apresentação 2T07
Apresentação 2T07
 
Apresentação 3T07
Apresentação 3T07Apresentação 3T07
Apresentação 3T07
 
Apresentação 3T09
Apresentação 3T09Apresentação 3T09
Apresentação 3T09
 
coastal.PDF
coastal.PDFcoastal.PDF
coastal.PDF
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
 

Similar a Iy3116761679

High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...IRJET Journal
 
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...iosrjce
 
IRJET- MAC Unit by Efficient Grouping of Partial Products along with Circular...
IRJET- MAC Unit by Efficient Grouping of Partial Products along with Circular...IRJET- MAC Unit by Efficient Grouping of Partial Products along with Circular...
IRJET- MAC Unit by Efficient Grouping of Partial Products along with Circular...IRJET Journal
 
DOUBLE PRECISION FLOATING POINT CORE IN VERILOG
DOUBLE PRECISION FLOATING POINT CORE IN VERILOGDOUBLE PRECISION FLOATING POINT CORE IN VERILOG
DOUBLE PRECISION FLOATING POINT CORE IN VERILOGIJCI JOURNAL
 
Multiple Valued Logic for Synthesis and Simulation of Digital Circuits
Multiple Valued Logic for Synthesis and Simulation of Digital CircuitsMultiple Valued Logic for Synthesis and Simulation of Digital Circuits
Multiple Valued Logic for Synthesis and Simulation of Digital CircuitsIJERA Editor
 
A Fast Floating Point Double Precision Implementation on Fpga
A Fast Floating Point Double Precision Implementation on FpgaA Fast Floating Point Double Precision Implementation on Fpga
A Fast Floating Point Double Precision Implementation on FpgaIJERA Editor
 
Fpga implementation of high speed 8 bit vedic multiplier using barrel shifter(1)
Fpga implementation of high speed 8 bit vedic multiplier using barrel shifter(1)Fpga implementation of high speed 8 bit vedic multiplier using barrel shifter(1)
Fpga implementation of high speed 8 bit vedic multiplier using barrel shifter(1)Karthik Sagar
 
Design and implementation of complex floating point processor using fpga
Design and implementation of complex floating point processor using fpgaDesign and implementation of complex floating point processor using fpga
Design and implementation of complex floating point processor using fpgaVLSICS Design
 
DESIGN OF LOW POWER MULTIPLIER
DESIGN OF LOW POWER MULTIPLIERDESIGN OF LOW POWER MULTIPLIER
DESIGN OF LOW POWER MULTIPLIERIRJET Journal
 
IRJET - Design and Implementation of Double Precision FPU for Optimised Speed
IRJET - Design and Implementation of Double Precision FPU for Optimised SpeedIRJET - Design and Implementation of Double Precision FPU for Optimised Speed
IRJET - Design and Implementation of Double Precision FPU for Optimised SpeedIRJET Journal
 
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...jmicro
 
IRJET- Efficient Design of Radix Booth Multiplier
IRJET- Efficient Design of Radix Booth MultiplierIRJET- Efficient Design of Radix Booth Multiplier
IRJET- Efficient Design of Radix Booth MultiplierIRJET Journal
 
The Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging SystemThe Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging SystemMelissa Luster
 
IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...
IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...
IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...IRJET Journal
 
Review on 32 bit single precision Floating point unit (FPU) Based on IEEE 754...
Review on 32 bit single precision Floating point unit (FPU) Based on IEEE 754...Review on 32 bit single precision Floating point unit (FPU) Based on IEEE 754...
Review on 32 bit single precision Floating point unit (FPU) Based on IEEE 754...IRJET Journal
 
Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.
Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.
Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.VLSICS Design
 
IRJET - Implementation of Neural Network on FPGA
IRJET - Implementation of Neural Network on FPGAIRJET - Implementation of Neural Network on FPGA
IRJET - Implementation of Neural Network on FPGAIRJET Journal
 

Similar a Iy3116761679 (20)

Jz2517611766
Jz2517611766Jz2517611766
Jz2517611766
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
 
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Pr...
 
IRJET- MAC Unit by Efficient Grouping of Partial Products along with Circular...
IRJET- MAC Unit by Efficient Grouping of Partial Products along with Circular...IRJET- MAC Unit by Efficient Grouping of Partial Products along with Circular...
IRJET- MAC Unit by Efficient Grouping of Partial Products along with Circular...
 
DOUBLE PRECISION FLOATING POINT CORE IN VERILOG
DOUBLE PRECISION FLOATING POINT CORE IN VERILOGDOUBLE PRECISION FLOATING POINT CORE IN VERILOG
DOUBLE PRECISION FLOATING POINT CORE IN VERILOG
 
Multiple Valued Logic for Synthesis and Simulation of Digital Circuits
Multiple Valued Logic for Synthesis and Simulation of Digital CircuitsMultiple Valued Logic for Synthesis and Simulation of Digital Circuits
Multiple Valued Logic for Synthesis and Simulation of Digital Circuits
 
A Fast Floating Point Double Precision Implementation on Fpga
A Fast Floating Point Double Precision Implementation on FpgaA Fast Floating Point Double Precision Implementation on Fpga
A Fast Floating Point Double Precision Implementation on Fpga
 
Fpga implementation of high speed 8 bit vedic multiplier using barrel shifter(1)
Fpga implementation of high speed 8 bit vedic multiplier using barrel shifter(1)Fpga implementation of high speed 8 bit vedic multiplier using barrel shifter(1)
Fpga implementation of high speed 8 bit vedic multiplier using barrel shifter(1)
 
Design and implementation of complex floating point processor using fpga
Design and implementation of complex floating point processor using fpgaDesign and implementation of complex floating point processor using fpga
Design and implementation of complex floating point processor using fpga
 
DESIGN OF LOW POWER MULTIPLIER
DESIGN OF LOW POWER MULTIPLIERDESIGN OF LOW POWER MULTIPLIER
DESIGN OF LOW POWER MULTIPLIER
 
IRJET - Design and Implementation of Double Precision FPU for Optimised Speed
IRJET - Design and Implementation of Double Precision FPU for Optimised SpeedIRJET - Design and Implementation of Double Precision FPU for Optimised Speed
IRJET - Design and Implementation of Double Precision FPU for Optimised Speed
 
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...
DESIGN OF DOUBLE PRECISION FLOATING POINT MULTIPLICATION ALGORITHM WITH VECTO...
 
Lp2520162020
Lp2520162020Lp2520162020
Lp2520162020
 
IRJET- Efficient Design of Radix Booth Multiplier
IRJET- Efficient Design of Radix Booth MultiplierIRJET- Efficient Design of Radix Booth Multiplier
IRJET- Efficient Design of Radix Booth Multiplier
 
The Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging SystemThe Principle Of Ultrasound Imaging System
The Principle Of Ultrasound Imaging System
 
IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...
IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...
IRJET- A Review on Single Precision Floating Point Arithmetic Unit of 32 Bit ...
 
Review on 32 bit single precision Floating point unit (FPU) Based on IEEE 754...
Review on 32 bit single precision Floating point unit (FPU) Based on IEEE 754...Review on 32 bit single precision Floating point unit (FPU) Based on IEEE 754...
Review on 32 bit single precision Floating point unit (FPU) Based on IEEE 754...
 
Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.
Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.
Braun’s Multiplier Implementation using FPGA with Bypassing Techniques.
 
Lc3519051910
Lc3519051910Lc3519051910
Lc3519051910
 
IRJET - Implementation of Neural Network on FPGA
IRJET - Implementation of Neural Network on FPGAIRJET - Implementation of Neural Network on FPGA
IRJET - Implementation of Neural Network on FPGA
 

Iy3116761679

  • 1. Riya Saini, R.D.Daruwala / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1676-1679 Efficient Implementation of Pipelined Double Precision Floating Point Multiplier Riya Saini*, R.D.Daruwala** M.Tech Student* *(Department of Electrical Engineering, Veermata Jijabai Technological Institute, Matunga, Mumbai-19) Professor & Dean** ** (Department of Electrical Engineering, Veermata Jijabai Technological Institute, Matunga, Mumbai-19) ABSTRACT The design of floating-point applications for Floating-point numbers are widely adopted in FPGAs is much different. Due to the inherent many applications due their dynamic complexity of floating point arithmetic mapping representation capabilities. Floating-point difficulties occurred. With the introduction of high representation is able to retain its resolution and level languages such as VHDL, rapid prototyping of accuracy compared to fixed-point representations. floating point units has become possible. Elaborate Unfortunately, floating point operators require simulation and synthesis tools at a higher design level excessive area (or time) for conventional aid the designer for a more controllable and implementations. The creation of floating point maintainable product. units under a collection of area, latency, and The IEEE standard for binary floating-point throughput constraints is an important arithmetic [2] provides a detailed description of the consideration for system designers. This paper floating-point representation and the specifications presents the implementation of a general purpose, for the various floating-point operations. It also scalable architecture used to synthesize floating specifies the method to handle special cases and point multipliers on FPGAs. Although several exceptions. implementations of floating point units targeted to This paper describes the implementation of FPGAs have been previously reported, most of pipelining in design the floating-point multiplier them are customized for specific applications. This using VHDL and its synthesis for a Xilinx Virtex-II paper presents a fully parallel floating-point FPGA using Xilinx‟s Integrated Software multiplier compliant with the IEEE 754 Standard Environment (ISE) 12.1. Pipelining is one of the for Floating-point Arithmetic. The design offers popular methods to realize high performance low latency and high throughput. A comparison computing platform. Pipelining is a technique where between our results and some previously reported multiple instruction executions are overlapped. In the implementations shows that our approach, in top-down design approach, four arithmetic modules: addition to the scalability feature, provides addition, subtraction, multiplication, and division: are multipliers with significant improvements in area combined to form the floating-point ALU. The and speed. We implemented our algorithms using pipeline modules are independent of each other. Xilinx ISE 12.1, with Xilinx Virtex-II Pro The organization of this paper is as follows: XC2VP100 FPGA as our target device. Section 2 describes background on floating point unit design. Section 3 describes algorithm for floating Keywords – Floating Point Multiplier, FPGA, point multiplication. Section 4 describes our VHDL. approach to multiply floating numbers. The simulation environment and results are briefed in 1. INTRODUCTION Section 5 and concluding remarks are discussed Field-programmable gate arrays (FPGAs) have Section 6. long been attractive for accelerating fixed-point applications. Early on, FPGAs could deliver tens of 2. BACKGROUND narrow, low latency fixed-point operations. As 2.1. Floating Point Representation FPGAs matured, the amount of parallelism to be Standard floating point numbers are represented exploited grew rapidly with FPGA size. This was a using an boon to many application designers as it enabled exponent and a mantissa in the following format: them to capture more of the application. It also meant (sign bit) mantissa × baseexponent +bias that the performance of FPGAs was growing faster The mantissa is a binary, positive fixed-point than that of CPUs [3]. value. Generally, the fixed point is located after the first bit,m0, so that mantissa = {m0.m1m2...mn}, where mi is the ith bit of the mantissa. The floating point number is “normalized” when m0 is one. The 1676 | P a g e
  • 2. Riya Saini, R.D.Daruwala / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1676-1679 exponent, combined with a bias, sets the range of representable values. A common value for the bias is −2k−1, where k is the bit-width of the exponent [4].The double precision floating point format has an 11 bit exponent and 52 bit mantissa plus a sign bit. This provides a wide dynamic range. Figure 1. 64-bit Floating Point Format The IEEE floating point standard makes floating point unit implementation portable and the precision of the results predictable. A variety of different circuit structures can be applied to the same number representations, offering flexibility. Figure 2. Floating Point Algorithm 2.2. Floating Point Implementations in FPGAs The two mantissas are to be multiplied, and the Several efforts to build floating point units using two exponents are to be added. In order to perform FPGAs have been made. These approaches have floating-point multiplication, a simple algorithm is generally explored realized: bit-width variation as a means to control precision. A 1. Add the exponents and subtract bias. floating point library containing units with 2. Multiply the mantissas and determine the sign of parameterized bitwidth was described in [6]. In this the result. library, mantissa and exponent bit-width can be 3. Normalize the resulting value, if necessary. customized. The library includes a unit that can convert between fixed point and floating point 4. PIPELINED FLOATING POINT numbers. The floating point units are arranged in MULTIPLICATION MODULE fixed pipeline stages. Multiplication entity has three 64-bit inputs and Several researchers [7, 9, 10] have implemented two 64-bit outputs. Selection input is used to enable FPGA floating point adders and multipliers that meet or disable the entity. Multiplication module is divided IEEE754 floating point standards. Most commercial into check zero, add exponent, multiply mantissa, floating point libraries provide units that comply with check sign, and normalize and concatenate all the IEEE754 standard [7]. Luk [8] showed that in modules, which are executed concurrently. Status order to cover the same dynamic range, a fixed point signal indicates special result cases such as overflow, design must be five times larger and 40% slower than underflow and result zero. a corresponding floating point design. In contrast to In this paper, pipelined floating point earlier approaches, our floating point unit generation multiplication is divided into three stages (Fig. 3). tool automates floating point unit creation. 3. FLOATING POINT MULTIPLICATION ALGORITHM The aim in developing a floating point multiplier routine was to pipeline the unit in order to produce a result every clock cycle. By pipelining the multiplier, the speed increased, however, the area increased as well. Different coding structures were tried in the VHDL code used to program the Xilinx chips in order to minimize size. 3.1 Algorithm The multiplier structure is organized as a three- stage pipeline. This arrangement allows the system to produce one result every clock cycle, after the first three values are entered into the unit. Figure 2 shows a flow chart of the multiplier structure. Figure 3. Pipelined Floating Point Multiplier 1677 | P a g e
  • 3. Riya Saini, R.D.Daruwala / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1676-1679 Stage 1 check whether the operand is 0 and report output port. The final normalized product with the the result accordingly. Stage 2 determines the product correct biased exponent is concatenated with product sign add exponents and multiply fractions. Stage3 sign. normalize and concatenate the product. 5. RESULT AND ANALYSIS 4.1 Check Zero module: Design is verified through simulation, which is Initially, two operands are checked to determine done in a bottom-up fashion. The aim in designing whether they contain a zero. If one of the operands is the floating point units was to pipeline each unit a zero, the zero_flag is set to 1. The output results zero, sufficient number of times in order to maximize which is passed through all the stages and outputted. speed and to minimize area. If neither of them are zero, then the inputs with The simulation output is obtained by using Xilinx IEEE754 format is unpacked and assigned to the simulation tool is as follows. check sign, add exponent and multiply mantissa modules. The mantissa is packed with the hidden bit „1‟. 4.2 Add exponent module: The module is activated if the zero flag is set. Else, zero is passed to the next stage and exp_flag is set to 0. Two extra bits are added to the exponent indicating overflow and underflow. The resulting sum has a double bias. So, the extra bias is subtracted from the exponent sum. After this, the exp_flag is set to 1. Figure 4. Synthesize result of multiplier 4.3 Multiply mantissa module: In this stage zero_flag is checked first. If the zero_flag is set to 0, then no calculation and normalization is performed. The mant_flag is set to 0. If both operands are not zero, the operation is done with multiplication operator. Mant_flag is set to 1 to indicate that this operation is executed. It then produces 46 bits where the lower order 32 bits of the product are truncated. 4.4 Check sign module: This module determines the product sign of two operands. The product is positive when the two operands have the same sign; otherwise it is negative. Figure 5. Simulation result of multiplier The sign bits are compared using an XOR circuit. The sign_flag is set to 1. 6. CONCLUSION Pipeline floating point multiplier design using 4.5 Normalize and concatenate all modules: VHDL is successfully designed, implemented, and This module checks the overflow and underflow tested. With the help of pipelined architecture i.e. after adding the exponent. Underflow occurs if the concurrent processing there will be less 9th bit is 1. Overflow occurs if the 8 bits is 1. If combinational delay which means faster response and exp_flag, sign_flag and mant_flag are set, then better throughput with less latency as compared with normalization is carried out. Otherwise, 32 zero bits sequential processing but there will be a tradeoff are assigned to the output. between speed and chip area. Pipelined architecture During the normalization operation, the mantissa's provide a faster response in floating point MSB is 1. Hence, no normalization is needed. The multiplication but also consumes more area i.e. hidden bit is dropped and the remaining bit is packed number of slices used on reconfigurable hardware are and assigned to the output port. Normalization more as compared with standard architecture. module set the mantissa's MSB to 1. The current Pipelining is used to decrease clock period. Using mantissa is shifted left until a 1 is encountered. For sequential processing there is larger latency but less each shift the exponent is decreased by 1. Therefore, number of slices are used on FPGAs as compared if the mantissa's MSB is 1, normalization is with pipelined architecture. Currently, we are completed and first bit is the implied bit and dropped. conducting further research that considers the further The remaining bits are packed and assigned to the 1678 | P a g e
  • 4. Riya Saini, R.D.Daruwala / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1676-1679 reductions in the hardware complexity in terms of [12] Virtex-II Pro and Virtex-II Pro X Platform synthesis. FPGAs: Complete Data Sheet. June 2005 (Rev 4.3), [cited Aug 2005]. REFERENCES [1] M. de Lorimier and A. DeHon. Floating point sparsematrix-vector multiply for FPGAs. In Proceedings of the ACM International Symposium on Field Programmable Gate Arrays, Monterey, CA, February 2005. [2] The Institute of Electrical and Electronic Engineers, Inc. IEEE Standard for Binary Floating-point Arithmetic. ANSI/IEEE Std 754- 1985. [3] K. D. Underwood. FPGAs vs. CPUs: Trends in peak floating-point performance. In Proceedings of the ACM International Symposium on Field Programmable Gate Arrays, Monterrey, CA, February 2004. [4] I. Koren. Computer Arithmetic Algorithms. Brookside Court Publishers, Amherst, MA, 1998. [5] IEEE Standards Board. IEEE standard for binary floating-point arithmetic. Technical Report ANSI/IEEE Std. 754-1985, The Institute of Electrical and Electronic Engineers, New York, 1985. [6] P. Belanovi´c and M. Leeser. A Library of Parameterized Floating Point Modules and Their Use. In Proceedings, International Conference on Field Programmable Logic and Applications, Montpelier, France, Aug. 2002. [7] B. Fagin and C. Renard. Field programmable gate arrays and floating point arithmetic. IEEE Transactions on VLSI Systems, 2(3):365–367, Sept. 1994. [8] A. A. Gaffar, W. Luk, P. Y. Cheung, N. Shirazi, and J. Hwang. Automating Customization of Floating-point Designs. In Proceedings, International Conference on Field- Programmable Logic and Applications, Montpelier, France, Aug. 2002. [9] W. B. Ligon, S. McMillan, G. Monn, F. Stivers, and K. D. Underwood. A Re-evaluation of the Practicality of Floating-Point Operations on FPGAs. In Proceedings, IEEE Symposium on Field-Programmable Custom Computing Machines, pages 206–215, Napa, CA, Apr. 1998. [10] L. Louca, W. H. Johnson, and T. A. Cook. Implementation of IEEE Single Precision Floating Point Addition and Multiplication on FPGAs. In Proceedings, IEEE Workshop on FPGAs for Custom Computing Machines, pages 107–116, Napa, CA, Apr. 1996. [11] E. M. Schwarz, M. Schmookler, and S. D. Trong. FPU implementations with denormalized numbers. IEEE Transactions on Computers, 54(7), July 2005. 1679 | P a g e