Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithmetic Operations

Embedded systems used in real-time applications require low power, less area and a high computation speed. For digital signal processing (DSP), image processing and communication applications, data are often received at a continuously high rate. Embedded processors have to cope with this high data rate and process the incoming data based on specific application requirements. Even though there are many different application domains, they all require arithmetic operations that quickly compute the desired values using a larger range of operation, reconfigurable behavior, low power and high precision. The type of necessary arithmetic operations may vary greatly among different applications. The RTL-based design and verification of one or more of these functions may be time-consuming. Some High Level Synthesis tools reduce this design and verification time but may not be optimal or suitable for low power applications. The developed MATLAB-based Arithmetic Engine improves design time and reduces the verification process, but the key point is to use a unified design that combines some of the basic operations with more complex operations to reduce area and power consumption. The results indicate that using the Arithmetic Engine from a simple design to more complex systems can improve design time by reducing the verification time by up to 62%. The MATLAB-based Arithmetic Engine generates structural RTL code, a testbench, and gives the designers more control. The MATLAB-based design and verification engine uses optimized algorithms for better accuracy at a better throughput.

ISSN (e): 2250 – 3005 || Volume, 06 || Issue, 04||April – 2016 ||
International Journal of Computational Engineering Research (IJCER)
www.ijceronline.com Open Access Journal Page 1
Matlab Based High Level Synthesis Engine for Area And Power
Efficient Arithmetic Operations
Semih Aslan
Ingram School of Engineering Texas State University San Marcos, Texas, 78666, USA
I. Introduction
Today, a significant number of embedded systems are focused on multimedia applications with almost insatiable
demand for low cost, high performance and low power hardware. Designing complex systems such as image
and video processing, compression, face recognition, object tracking, 3G or 4G modems, multi-standard
CODECs, and HD decoding schemes requires the integration of many complex blocks and a long verification
process [1][2]. These complex designs are based on I/O peripherals, one or more processors, bus interfaces,
A/D, D/A, embedded software, memories and sensors. In the past, complete systems were designed with
multiple chips and connected together on PCBs, but with today’s technology, all functions can be incorporated
in a single chip. These complete systems are known as System-on-Chip (SoC) [2].System-on-chip (SoC)
designs are mainly accomplished by using Register Transfer Languages (RTL) such as Verilog and VHDL. RTL
design flow [1] [2] for both FPGA and ASIC is similar and is shown in Figure 1. An algorithm can be converted
to RTL using the behavioral model description method or by using pre-defined IP core blocks. After completing
this RTL code, formal verification must be done before implementation. After implementation of the RTL code,
timing verification needs to be done for proper operation.
RTL design abstracts logic structures, timing and registers [1]. Because of this, every clock change
causes a state change in the design. This timing dependency causes every event to be simulated, which results in
a slower simulation time and longer verification period of the design. The design and verification of an
algorithm in RTL in Figure 1 can take up 50-60% of the “Time to Market” (TTM). The RTL design becomes
impractical for larger systems that have high data flow between the blocks, and it requires millions of gates.
Even though design time may improve by using behavioral modeling and IP cores, the difficulty in synthesis,
poor performance results and rapid changes in the design make IP cores difficult to adapt and change. Therefore,
systems rapidly become obsolete.The limitations of RTL and longer TTM forced designers to think of the
design as a whole system rather than blocks. In addition, software integration in SoC was always done after the
hardware was designed. When the system gets more complex, software integration is desirable during hardware
Abstract
Embedded systems used in real-time applications require low power, less area and a high
computation speed. For digital signal processing (DSP), image processing and communication
applications, data are often received at a continuously high rate. Embedded processors have to
cope with this high data rate and process the incoming data based on specific application
requirements. Even though there are many different application domains, they all require
arithmetic operations that quickly compute the desired values using a larger range of operation,
reconfigurable behavior, low power and high precision. The type of necessary arithmetic
operations may vary greatly among different applications. The RTL-based design and verification
of one or more of these functions may be time-consuming. Some High Level Synthesis tools reduce
this design and verification time but may not be optimal or suitable for low power applications.
The developed MATLAB-based Arithmetic Engine improves design time and reduces the
verification process, but the key point is to use a unified design that combines some of the basic
operations with more complex operations to reduce area and power consumption. The results
indicate that using the Arithmetic Engine from a simple design to more complex systems can
improve design time by reducing the verification time by up to 62%. The MATLAB-based
Arithmetic Engine generates structural RTL code, a testbench, and gives the designers more
control. The MATLAB-based design and verification engine uses optimized algorithms for better
accuracy at a better throughput.
Keywords: FPGA, High Level Synthesis, MATLAB, Optimized Hardware, Power Efficient, RTL.
Matlab Based High Level Synthesis Engine For Area And Power Efficient Arithmetic Operations
www.ijceronline.com Open Access Journal Page 2
implementation. Over the last two decades, designers were forced to find new methods to replace RTL due to
improvements in SoC and shorter TTM. Because of the extensive work done in Electronics System Level
Design (ESLD), HW/SW co-design of a system and High Level Synthesis (HLS)[2][4] are integrated into
FPGA and ASIC design flow.
Algorithm
RTL Timing
Translate
Map
Place&Route
IMPLEMENT
Bit
File
Formal Proof
Logic
Synthesis
Figure 1. FPGA RTL level synthesis flow
The next section will describe the proposed MATLAB HLS Arithmetic (MHA) Engine design and
implementation. Section III and IV will focus on the error analysis and testbench generation respectively and the
conclusion will describe future work and improvements.
II. Mha Engine
RTL description of a system can be implemented from a behavioral description of the system in Perl,
C, Python and MATLAB. This will result in a faster verification process and shorter TTM. It is also possible to
have a hybrid design where RTL blocks can be integrated with HLS [2].The HLS design flow shows that a
group of algorithms that represent the whole system or parts of a system can be implemented using a high level
language such as Perl, C, C++ , Java, MATLAB [2][5]. Each part in the system can be tested independently
before the whole system is tested. During this testing process, the RTL testbenches may also be generated. After
testing is complete, the system can be partitioned into HW and SW. This enables SW designers to join the
design process during HW design; in addition, RTL can be tested by using both HW/SW together. After the
verification process, the design can be implemented using FPGA synthesis tools.The integration of HLS into
FPGA design flow is shown in Figure 2.
MATLAB
Algorithm
High Level
Synthesis
(HLS)
Timing
Translate
Map
Place&Route
IMPLEMENT
Bit
File
Formal Proof
RTL
MHA Engine
Figure 2. FPGA high level synthesis flow with MHA Engine
Since the early days of VLSI design, application-specific hardware has been used for optimal
implementation of algorithms. This approach is considered the fastest design scheme but is also the most area
consuming system due to the inherently redundant nature of a design that only computes one operation.
However, there are other possible designs for DSP implementations that can be used for two or more operations
[1][3]. This design approach consists of processing blocks that can compute multiple operations using dedicated
hardware designed for a particular cluster of operations. An improved design approach should exploit the
redundancy and common elements that exist among the sub-blocks. This would result in shared building blocks
and dramatically reduced hardware requirements.
Matlab Based High Level Synthesis Engine For Area And Power Efficient Arithmetic Operations
www.ijceronline.com Open Access Journal Page 3
The proposed design focuses on designing a large system that will be faster with a design principle
similar to HLS, asexplained above. The main work focuses on multi-purpose, reused hardware structures that
produce a unified, area efficient reconfigurable system. This design can reduce the area by 64% [2][[6]. The
components of the MHA Engineare shown in Figure 3.
Arithmetic Operation
. Addition/Subtraction
. Multiplication
. Division
. Square Root
. Inverse Square Root
. Sin(θ), Cos(θ), Tan(θ), Cot(θ)
. Sinh(θ), Cosh(θ), Tanh(θ)
. ex
ControlI/O
MHE Engine
Figure 3. MATLAB Based HLS Arithmetic Engine (MHA) block
The MHA block has three important principles:
 Compute required arithmetic operations
 Customized range and accuracy
 Generate an area-efficient, fast system for low power applications
The MHA accepts inputs from the user via two GUIs to make it more user friendly and efficient. The
“Main” GUI that is shown in Figure 4 below includes the following sections:
 FPGA or ASIC support
 Vendor based IP Core support
 Project Name (default is c:MHAMHA)
 Top Module Name (default is MHA)
 Language – Verilog or VHDL (design and verifications – current system only supports Verilog HDL)
 Rounding - Truncation or RNE (Rounding cannot be done without a selection)
 Number system – Fixed or Floating Point (Fixed point up to 64-bit - current system only supports Fixed
Point Number system)
Figure 4. Main GUI for MHA Engine
 Signed or unsigned number systems
 Target – Frequency and throughput
 Area or speed based optimization
Matlab Based High Level Synthesis Engine For Area And Power Efficient Arithmetic Operations
www.ijceronline.com Open Access Journal Page 4
 Testbench generation
o Automated testbench with MATLAB
o Modelsim .do file for fast automation
o Automated testbench file for Modelsim
o Error comparison with MATLAB
o User defined test data option
The “Arithmetic” tab shown in Figure 5 has the following sections:
 Basic arithmetic operations
o Addition/Subtraction (Area or speed optimized based on Ripple-Carry Adder (RCA) or Carry-Lookahead
Adder (CLA))
o Multiplication (Array or Booth multiplier)
 Advanced arithmetic operations
o Division (Newton-Raphson, Goldschmidt, or CORDIC)
o Square Root (Newton-Raphson, Goldschmidt, or CORDIC)
o Inverse Square Root (Newton-Raphson, Goldschmidt, or CORDIC)
 Elementary functions
o Trigonometric functions – sine, cosine, tangent, and cotangent (table method, CORDIC or polynomial
based design)
o Hyperbolic functions – sinh, cosh, tanh (table method, CORDIC or polynomial based design)
o Exponential function – exp(x) (table method, CORDIC or polynomial based design)
Figure 5. Arithmetic operations GUI
The MHA uses a bottom-up design process that starts with the elementary functions and then moves to
the simplest arithmetic operations such as multiplication and addition. This is shown in Figure 6 below.
This design flow contains the following procedure: First, selection of elementary functions [7], selection of
basic arithmetic operations, and generation of area efficient hardware for FPGA and VLSI. There are 2-64 bit
selections that are suitable for a vast variety of applications with the requested precision. The section’s addition
and multiplications are used based on the previous designs. Division, inverse square root and square roots are
designed based on the same architecture, and the modified design reduces the area by 64% [8]. Next, the
CORDIC [9] [10] or polynomial methods [4] are used to calculate elementary functions [7] [11]. This area-
efficient design is optimized for speed by implementing a smart control system. For performance evaluation and
synthesis are implemented with Xilinx FPGAs [12] and Microwind [13] VLSI design tool.
Matlab Based High Level Synthesis Engine For Area And Power Efficient Arithmetic Operations
www.ijceronline.com Open Access Journal Page 5
Arithmetic Block
Input
Parameters
Elementary
Functions
Arithmetic
HDL
Testbench
Precision
Figure 6. The MHA design flow
III. Error Analysis
When designing hardware with many arithmetic operations, one of the most important objectives is to
produce results that have a minimal absolute and average computation error. Arithmetic operations in digital
systems generally introduce three types of errors: number representation, rounding, and algorithmic or design
error [14][15].The exact representation of some numbers or events in radix-n may not be possible due to the
limitations in ADCs, the sampling rate, and the number of available bits. In addition, many numbers cannot be
converted from radix-n to radix-m without an error. For example, number 0.1 and 0.2 in radix-10 cannot be
represented in radix-2 without an error. This error can be reduced by increasing the number of bits. The
reduction of this error with respect to the number of bits is shown in Figure 7.
Figure 7. Error representation of number radix conversion
Figure 8 shows the error generated for radix-10 to radix-2 conversion of 128 fractional numbers (fractional part
of 30 bits).
Figure8. Random 128-number conversion error from Radix-10 to Radix-2
Matlab Based High Level Synthesis Engine For Area And Power Efficient Arithmetic Operations
www.ijceronline.com Open Access Journal Page 6
During and after the calculation of certain arithmetic operations, the total number of bits may exceed
the number of bits available; these values need to be rounded. For example, multiplying two n-bit numbers
produces a product of 2n-bit and this result may need to be represented with n-bit.If there were more
multiplications on the design path, the number of bits would increase in a linear fashion. To prevent this, each
multiplier output needs to be rounded. There are a few ways to implement rounding in hardware, with the most
commonly used methods being round to the nearest even, round towards zero (truncation), round down (floor),
round up (ceiling) and round away from zero [14].
In this section, truncation (TRA)[14] and round to the nearest even (RNE)[14] schemes are compared. During
these rounding operations, an error value is introduced. The RNE and TRA and their error values are shown in
Table 1.
Table 1. RNE and TRA
Number RNE TRA
Rounded Value Error Rounded Value Error
X0.00 X0. 0.00 X0. 0.00
X0.01 X0. 0.25 X0. 0.25
X0.10 X0. 0.50 X0. 0.50
X0.11 X0.+ulp -0.25 X0. 0.75
X1.00 X1. 0.00 X1. 0.00
X1.01 X1. 0.25 X1. 0.25
X1.10 X1.+ulp -0.50 X1. 0.50
X1.11 X1.+ulp -0.25 X1. 0.75
Total --- 0 --- 3.00
Figure 9 shows advantage of RNE over TRA when average error is considered. The requested number of
precision and selected rounding scheme can affect the size of the hardware and overall speed and throughput.
Users can change the selected accuracy to see area and throughput estimates simultaneously without increasing
the overall design time. This will make it possible to select the optimal design for synthesis. The MHA Engine
can create hardware for precision based on increase the bit size during the mid-operation and apply rounding
before the output stage.
Figure 9. 32-Bit to 16-Bit Rounding Errors with RNE and TRA
Iv. Testbench Generation
One of the most important and complicated sections of the MHAEngine is generation of the testbench
files and error checking using MATLAB and Modelsim. Before the RTL code is synthesized, it can be tested
using a testbench that is created using MATLAB and Modelsim. The testbench generation and error checking
block diagram is shown in Figure 10 below.
MODELSIM
USER
CONSTRAINTS
MATLAB
VERILOG
TESTBENCH
RTL
DUT
RESULTS
(text file)
Verification and
Error Files
Figure 10. MATLAB and Modelsim flow
Matlab Based High Level Synthesis Engine For Area And Power Efficient Arithmetic Operations
www.ijceronline.com Open Access Journal Page 7
After generation of the design and testbench files,the user defined test vectors need to be generated. The
first step is to generate positive random numbers [0,1]. These numbers must be converted into positive and
negative numbers based on signed numbers. To generate random test values and test results, the following
procedure is followed:
 Get the user defined testvector number n.
 Generate random test vectors (T{n})
 Generate random binary numbers using MATLAB
o For fixed-point signed numbers:
T_Bin=dec2bin(T*2^n,n)
o For fixed-point unsigned numbers:
T_Bin=dec2bin(T*2^(n-i),n)
 Generate the Modelsim testbench file and get results
 Compare results using MATLAB
After generation of the design, testbench, and test vector files, the next step is to generate a Modelsim
tcl .do file that can be transferred into Modelsim all together. The .do file will generate the project file and will
import all design files, including testbench, into Modelsim. This will run all files and generate the results as a
text file. Once Modelsim-generated results are imported into MATLAB, correct operation and error analysis
needs to be performed. An important issue which needs to be addressed during the verification process is
working with negative fixed-point numbers in MATLAB. It is important because it does not convert negative
binary numbers and binary floating numbers to a decimal number. This problem is addressed using the
MATLAB codes given in Figure 11.
Figure 11. Signed binary to decimal conversion
V. Conclusion
An area efficient, MATLAB based HLS engine for arithmeticoperations is designed for low power and
high-speed applications. The MHA Engine decreases design system time and verification by up to 64% without
compromising speed and efficiency. The MHA Engine uses a smart control system that is optimized based on
the desired operations. The MHA Engine is a bridge between RTL and HLS. It uses RTL-based basic blocks to
design most complicated arithmetic operations using structural model design and HLS-style fast and optimized
verification. Any designed system can be reconfigured at any time in any way in MHA Engine without going
through the same design and verification hassle.
MATLAB-based verification makes it possible to use all the features of MATLAB for faster and more efficient
verification. The MHA Engine can be easily reconfigurable to systems available at any level, due to changes in
the computer system and software.
As explained above, this system generates area efficient fast arithmetic and elementary functionsthat
can be used over a wide area of applications in DSP, image processing, and communication systems. It can be
used for FFT, DCT and DWT calculations and Chirplet transforms [16][17][18].
It can also be very important for educational institutions in order to test their systems using the verification
testbench that, when desired, works as an independent design tool. Overall, this work will forge the way for
those who need to make sudden changes in their systems and need fast verification. They can adopt and apply
any changes using the MHA Engine or generate similar systems for faster design and verification. In addition,
because the MHA Engine generated code is designed structurally, code can be changed easily at any level, if so
desired. Future workwill integrate VHDL code and IEEE 754 floating point numbers (both single and double
precision) implementation. Another future goal is to make this platform totally open source by using only
Iverilog and replacing MATLAB with Octave.
Matlab Based High Level Synthesis Engine For Area And Power Efficient Arithmetic Operations
www.ijceronline.com Open Access Journal Page 8
References
[1]. Hendry, D.C., and A.A. Duncan. "Area Efficient DSP Datapath Synthesis." Design Automation Conference (1995): 130-135.
[2]. Aslan, S., Oruklu, E., and Saniie J., “A high-level synthesis and verification tool for fixed to floating point conversion”, IEEE
International Midwest Symposium on Circuits and Systems, 2012, Pages, 908-911.
[3]. Andrieux, J., M. Feix, G. Mourgues, P. Bertrand, B. Izrar, and V. Nguyen. "Optimum Smoothing of the Wigner Ville Distribution."
IEEE Transactions on Acoustics, Speech, and Signal Processing 36.5(1987): 764-769.
[4]. Kilts, S. Advanced FPGA Design Architecture, Implementation, and Optimizations. New York: Wiley Inter-Science, 2007.
[5]. Chen, W. The VLSI Handbook. Boca Raton: CRC Publisher, 2007.
[6]. Dehon, A., and S. Hauck. Reconfigurable Computing The Theory and Practice of FPGA-Based Computing. Burlington,
Massachusetts: Elsevier, 2008.
[7]. Walther, J.S. "A unified Algortihm for Elementary Functions." American Federation of Information Processing Societies Joint
Computer Conferences (1971): 379-385.
[8]. Oruklu, E., J. Saniie, and S. Aslan. "Realization of Area Efficient QR Factorization using Unified Diivision, Square Root and
Inverse Sqaure Root Hardware." IEEE Electro/Information Technology (2009): 245-250.
[9] Volder, J. "The CORDIC Trigonometric Computing Technique." IEEE Transactions Electronic Computers 8.3 (1959): 330-334.
[9]. 10] Striling, W. C., and T. K. Moon. Mathematical Methods and algorithms for Signal Processing. New Jersey: Prentice
Hall, 2000.
[10]. Xilinx. (2016), http://www.xilinx.com/
[11]. Microwind (2106) http://microwind.net/
[12]. Stine, J. E. "Digital Computer Arithmetic Datapath Design using Verilog HDL." Digital Computer Arithmetic Datapath Design
using Verilog HDL. Norwell, Massachusetts: Kluwer Academic Publishing, 2004.
[13]. Teukolsky, S. A., W. T. Vetterling, B. P. Flannery, and W. H. Press. "Numerical Recipes: The Art of Scientific Computing."
Numerical Recipes: The Art of Scientific Computing, 3rd ed. New York, New York: Cambridge University Press, 2007.
[14]. Omar, J., E. E. Swartzlander Jr., and M. J. Schulte. "Optimal Initial Approximations for the Newton-Raphson Division Algorithm."
Springer-Verlag Journal of Computing 53.3-4 (1994): 233-242.
[15]. Seidel, P-M, W. E. Ferguson, and G. Even. "A Parametric Error Analysis of Goldschmidt’s Division Algorithm." Journal of
Computer and System Sciences 70.1 (2005): 118-139.
[16]. Chunduri, K. C. "Implementation of Adaptive Filter Structures on a Fixed Point Signal Processor for Acoustical Noise Reduction"
(2006).

Recomendados

Implementation of Radix-4 Booth Multiplier by VHDL por
Implementation of Radix-4 Booth Multiplier by VHDLImplementation of Radix-4 Booth Multiplier by VHDL
Implementation of Radix-4 Booth Multiplier by VHDLpaperpublications3
74 vistas11 diapositivas
Proposal for google summe of code 2016 por
Proposal for google summe of code 2016 Proposal for google summe of code 2016
Proposal for google summe of code 2016 Mahesh Dananjaya
427 vistas7 diapositivas
“eXtending” the Automation Toolbox: Introduction to TwinCAT 3 Software and eX... por
“eXtending” the Automation Toolbox: Introduction to TwinCAT 3 Software and eX...“eXtending” the Automation Toolbox: Introduction to TwinCAT 3 Software and eX...
“eXtending” the Automation Toolbox: Introduction to TwinCAT 3 Software and eX...Design World
2.6K vistas49 diapositivas
Focus - GSM UMTS LTE Performance and Configuration Management Solution por
Focus - GSM UMTS LTE Performance and Configuration Management SolutionFocus - GSM UMTS LTE Performance and Configuration Management Solution
Focus - GSM UMTS LTE Performance and Configuration Management SolutionAhmet Ozturk
1.2K vistas27 diapositivas
Ethercat twincat e por
Ethercat twincat eEthercat twincat e
Ethercat twincat eRona Afdalana
809 vistas4 diapositivas
Background And An Architecture Example por
Background And An Architecture ExampleBackground And An Architecture Example
Background And An Architecture ExampleGlen Wilson
1.4K vistas30 diapositivas

Más contenido relacionado

La actualidad más candente

FT Architecture For Cloud Service Computing por
FT Architecture For Cloud Service ComputingFT Architecture For Cloud Service Computing
FT Architecture For Cloud Service Computingdestruck
585 vistas16 diapositivas
Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ... por
Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...
Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...IRJET Journal
35 vistas7 diapositivas
[Capella Day 2019] Model execution and system simulation in Capella por
[Capella Day 2019] Model execution and system simulation in Capella[Capella Day 2019] Model execution and system simulation in Capella
[Capella Day 2019] Model execution and system simulation in CapellaObeo
917 vistas25 diapositivas
참여기관_발표자료-국민대학교 201301 정기회의 por
참여기관_발표자료-국민대학교 201301 정기회의참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의DzH QWuynh
224 vistas95 diapositivas
Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generat... por
Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generat...Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generat...
Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generat...Arun Joseph
95 vistas6 diapositivas
An Approach to Overcome Modeling Inaccuracies for Performance Simulation Sig... por
An Approach to Overcome Modeling  Inaccuracies for Performance Simulation Sig...An Approach to Overcome Modeling  Inaccuracies for Performance Simulation Sig...
An Approach to Overcome Modeling Inaccuracies for Performance Simulation Sig...Pankaj Singh
176 vistas35 diapositivas

La actualidad más candente(20)

FT Architecture For Cloud Service Computing por destruck
FT Architecture For Cloud Service ComputingFT Architecture For Cloud Service Computing
FT Architecture For Cloud Service Computing
destruck585 vistas
Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ... por IRJET Journal
Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...
Performance Evaluation of FPGA Based Runtime Dynamic Partial Reconfiguration ...
IRJET Journal35 vistas
[Capella Day 2019] Model execution and system simulation in Capella por Obeo
[Capella Day 2019] Model execution and system simulation in Capella[Capella Day 2019] Model execution and system simulation in Capella
[Capella Day 2019] Model execution and system simulation in Capella
Obeo917 vistas
참여기관_발표자료-국민대학교 201301 정기회의 por DzH QWuynh
참여기관_발표자료-국민대학교 201301 정기회의참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의
DzH QWuynh224 vistas
Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generat... por Arun Joseph
Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generat...Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generat...
Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generat...
Arun Joseph95 vistas
An Approach to Overcome Modeling Inaccuracies for Performance Simulation Sig... por Pankaj Singh
An Approach to Overcome Modeling  Inaccuracies for Performance Simulation Sig...An Approach to Overcome Modeling  Inaccuracies for Performance Simulation Sig...
An Approach to Overcome Modeling Inaccuracies for Performance Simulation Sig...
Pankaj Singh176 vistas
Capella annual meeting 2021 por Obeo
Capella annual meeting 2021Capella annual meeting 2021
Capella annual meeting 2021
Obeo446 vistas
Overcoming challenges of_verifying complex mixed signal designs por Pankaj Singh
Overcoming challenges of_verifying complex mixed signal designsOvercoming challenges of_verifying complex mixed signal designs
Overcoming challenges of_verifying complex mixed signal designs
Pankaj Singh158 vistas
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop) por Apache Apex
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
Apache Apex9.3K vistas
Instruction level parallelism por deviyasharwin
Instruction level parallelismInstruction level parallelism
Instruction level parallelism
deviyasharwin2.8K vistas
Evaluation of morden computer & system attributes in ACA por Pankaj Kumar Jain
Evaluation of morden computer &  system attributes in ACAEvaluation of morden computer &  system attributes in ACA
Evaluation of morden computer & system attributes in ACA
Pankaj Kumar Jain5.9K vistas
Provisioning Bandwidth & Logical Circuits Using Telecom-Based GIS . por SSP Innovations
Provisioning Bandwidth & Logical Circuits Using Telecom-Based GIS.Provisioning Bandwidth & Logical Circuits Using Telecom-Based GIS.
Provisioning Bandwidth & Logical Circuits Using Telecom-Based GIS .
SSP Innovations1.2K vistas
Qualifying a high performance memory subsysten for Functional Safety por Pankaj Singh
Qualifying a high performance memory subsysten for Functional SafetyQualifying a high performance memory subsysten for Functional Safety
Qualifying a high performance memory subsysten for Functional Safety
Pankaj Singh202 vistas
Network Planning & Design: An Art or a Science? por Vishal Sharma, Ph.D.
Network Planning & Design: An Art or a Science?Network Planning & Design: An Art or a Science?
Network Planning & Design: An Art or a Science?
Vishal Sharma, Ph.D.1.6K vistas
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) Tools por Vishal Sharma, Ph.D.
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) ToolsA Survey of Recent Advances in Network Planning/Traffic Engineering (TE) Tools
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) Tools
Vishal Sharma, Ph.D.2.7K vistas
Pipelining and ILP (Instruction Level Parallelism) por A B Shinde
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism)
A B Shinde10.1K vistas
Teklabz schematics generator por Ramzi Qaqish
Teklabz schematics generatorTeklabz schematics generator
Teklabz schematics generator
Ramzi Qaqish676 vistas

Similar a Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithmetic Operations

Cse viii-advanced-computer-architectures-06cs81-solution por
Cse viii-advanced-computer-architectures-06cs81-solutionCse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionShobha Kumar
3.9K vistas111 diapositivas
Design and implementation of complex floating point processor using fpga por
Design and implementation of complex floating point processor using fpgaDesign and implementation of complex floating point processor using fpga
Design and implementation of complex floating point processor using fpgaVLSICS Design
695 vistas9 diapositivas
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog... por
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...Sara Alvarez
3 vistas12 diapositivas
System on Chip Based RTC in Power Electronics por
System on Chip Based RTC in Power ElectronicsSystem on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power ElectronicsjournalBEEI
13 vistas6 diapositivas
Mirabilis_Design AMD Versal System-Level IP Library por
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryDeepak Shankar
22 vistas53 diapositivas
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und... por
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Joachim Schlosser
7.4K vistas56 diapositivas

Similar a Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithmetic Operations(20)

Cse viii-advanced-computer-architectures-06cs81-solution por Shobha Kumar
Cse viii-advanced-computer-architectures-06cs81-solutionCse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solution
Shobha Kumar3.9K vistas
Design and implementation of complex floating point processor using fpga por VLSICS Design
Design and implementation of complex floating point processor using fpgaDesign and implementation of complex floating point processor using fpga
Design and implementation of complex floating point processor using fpga
VLSICS Design695 vistas
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog... por Sara Alvarez
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
Sara Alvarez3 vistas
System on Chip Based RTC in Power Electronics por journalBEEI
System on Chip Based RTC in Power ElectronicsSystem on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power Electronics
journalBEEI13 vistas
Mirabilis_Design AMD Versal System-Level IP Library por Deepak Shankar
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
Deepak Shankar22 vistas
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und... por Joachim Schlosser
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Joachim Schlosser7.4K vistas
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr... por IRJET Journal
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET Journal34 vistas
IRJET- ALPYNE - A Grid Computing Framework por IRJET Journal
IRJET- ALPYNE - A Grid Computing FrameworkIRJET- ALPYNE - A Grid Computing Framework
IRJET- ALPYNE - A Grid Computing Framework
IRJET Journal19 vistas
Co question bank LAKSHMAIAH por veena babu
Co question bank LAKSHMAIAH Co question bank LAKSHMAIAH
Co question bank LAKSHMAIAH
veena babu4.5K vistas
Scaling Application on High Performance Computing Clusters and Analysis of th... por Rusif Eyvazli
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...
Rusif Eyvazli63 vistas
Instruction Set Extension of a Low-End Reconfigurable Microcontroller in Bit-... por IJECEIAES
Instruction Set Extension of a Low-End Reconfigurable Microcontroller in Bit-...Instruction Set Extension of a Low-End Reconfigurable Microcontroller in Bit-...
Instruction Set Extension of a Low-End Reconfigurable Microcontroller in Bit-...
IJECEIAES6 vistas
Develop High-bandwidth/low latency electronic systems for AI/ML application por Deepak Shankar
Develop High-bandwidth/low latency electronic systems for AI/ML applicationDevelop High-bandwidth/low latency electronic systems for AI/ML application
Develop High-bandwidth/low latency electronic systems for AI/ML application
Deepak Shankar83 vistas
Co emulation of scan-chain based designs por ijcsit
Co emulation of scan-chain based designsCo emulation of scan-chain based designs
Co emulation of scan-chain based designs
ijcsit450 vistas
Synopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation System por Mostafa Khamis
Synopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation SystemSynopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation System
Synopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation System
Mostafa Khamis4.5K vistas

Último

K8S Roadmap.pdf por
K8S Roadmap.pdfK8S Roadmap.pdf
K8S Roadmap.pdfMaryamTavakkoli2
6 vistas1 diapositiva
What is Unit Testing por
What is Unit TestingWhat is Unit Testing
What is Unit TestingSadaaki Emura
23 vistas25 diapositivas
zincalume water storage tank design.pdf por
zincalume water storage tank design.pdfzincalume water storage tank design.pdf
zincalume water storage tank design.pdf3D LABS
5 vistas1 diapositiva
802.11 Computer Networks por
802.11 Computer Networks802.11 Computer Networks
802.11 Computer NetworksTusharChoudhary72015
10 vistas33 diapositivas
SUMIT SQL PROJECT SUPERSTORE 1.pptx por
SUMIT SQL PROJECT SUPERSTORE 1.pptxSUMIT SQL PROJECT SUPERSTORE 1.pptx
SUMIT SQL PROJECT SUPERSTORE 1.pptxSumit Jadhav
13 vistas26 diapositivas
Generative AI Models & Their Applications por
Generative AI Models & Their ApplicationsGenerative AI Models & Their Applications
Generative AI Models & Their ApplicationsSN
8 vistas1 diapositiva

Último(20)

zincalume water storage tank design.pdf por 3D LABS
zincalume water storage tank design.pdfzincalume water storage tank design.pdf
zincalume water storage tank design.pdf
3D LABS5 vistas
SUMIT SQL PROJECT SUPERSTORE 1.pptx por Sumit Jadhav
SUMIT SQL PROJECT SUPERSTORE 1.pptxSUMIT SQL PROJECT SUPERSTORE 1.pptx
SUMIT SQL PROJECT SUPERSTORE 1.pptx
Sumit Jadhav 13 vistas
Generative AI Models & Their Applications por SN
Generative AI Models & Their ApplicationsGenerative AI Models & Their Applications
Generative AI Models & Their Applications
SN8 vistas
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ... por AltinKaradagli
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...
AltinKaradagli9 vistas
Effect of deep chemical mixing columns on properties of surrounding soft clay... por AltinKaradagli
Effect of deep chemical mixing columns on properties of surrounding soft clay...Effect of deep chemical mixing columns on properties of surrounding soft clay...
Effect of deep chemical mixing columns on properties of surrounding soft clay...
AltinKaradagli6 vistas
Instrumentation & Control Lab Manual.pdf por NTU Faisalabad
Instrumentation & Control Lab Manual.pdfInstrumentation & Control Lab Manual.pdf
Instrumentation & Control Lab Manual.pdf
NTU Faisalabad 5 vistas
_MAKRIADI-FOTEINI_diploma thesis.pptx por fotinimakriadi
_MAKRIADI-FOTEINI_diploma thesis.pptx_MAKRIADI-FOTEINI_diploma thesis.pptx
_MAKRIADI-FOTEINI_diploma thesis.pptx
fotinimakriadi8 vistas
MSA Website Slideshow (16).pdf por msaucla
MSA Website Slideshow (16).pdfMSA Website Slideshow (16).pdf
MSA Website Slideshow (16).pdf
msaucla68 vistas
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx por lwang78
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
lwang7853 vistas
Machine Element II Course outline.pdf por odatadese1
Machine Element II Course outline.pdfMachine Element II Course outline.pdf
Machine Element II Course outline.pdf
odatadese19 vistas

Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithmetic Operations

  • 1. ISSN (e): 2250 – 3005 || Volume, 06 || Issue, 04||April – 2016 || International Journal of Computational Engineering Research (IJCER) www.ijceronline.com Open Access Journal Page 1 Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithmetic Operations Semih Aslan Ingram School of Engineering Texas State University San Marcos, Texas, 78666, USA I. Introduction Today, a significant number of embedded systems are focused on multimedia applications with almost insatiable demand for low cost, high performance and low power hardware. Designing complex systems such as image and video processing, compression, face recognition, object tracking, 3G or 4G modems, multi-standard CODECs, and HD decoding schemes requires the integration of many complex blocks and a long verification process [1][2]. These complex designs are based on I/O peripherals, one or more processors, bus interfaces, A/D, D/A, embedded software, memories and sensors. In the past, complete systems were designed with multiple chips and connected together on PCBs, but with today’s technology, all functions can be incorporated in a single chip. These complete systems are known as System-on-Chip (SoC) [2].System-on-chip (SoC) designs are mainly accomplished by using Register Transfer Languages (RTL) such as Verilog and VHDL. RTL design flow [1] [2] for both FPGA and ASIC is similar and is shown in Figure 1. An algorithm can be converted to RTL using the behavioral model description method or by using pre-defined IP core blocks. After completing this RTL code, formal verification must be done before implementation. After implementation of the RTL code, timing verification needs to be done for proper operation. RTL design abstracts logic structures, timing and registers [1]. Because of this, every clock change causes a state change in the design. This timing dependency causes every event to be simulated, which results in a slower simulation time and longer verification period of the design. The design and verification of an algorithm in RTL in Figure 1 can take up 50-60% of the “Time to Market” (TTM). The RTL design becomes impractical for larger systems that have high data flow between the blocks, and it requires millions of gates. Even though design time may improve by using behavioral modeling and IP cores, the difficulty in synthesis, poor performance results and rapid changes in the design make IP cores difficult to adapt and change. Therefore, systems rapidly become obsolete.The limitations of RTL and longer TTM forced designers to think of the design as a whole system rather than blocks. In addition, software integration in SoC was always done after the hardware was designed. When the system gets more complex, software integration is desirable during hardware Abstract Embedded systems used in real-time applications require low power, less area and a high computation speed. For digital signal processing (DSP), image processing and communication applications, data are often received at a continuously high rate. Embedded processors have to cope with this high data rate and process the incoming data based on specific application requirements. Even though there are many different application domains, they all require arithmetic operations that quickly compute the desired values using a larger range of operation, reconfigurable behavior, low power and high precision. The type of necessary arithmetic operations may vary greatly among different applications. The RTL-based design and verification of one or more of these functions may be time-consuming. Some High Level Synthesis tools reduce this design and verification time but may not be optimal or suitable for low power applications. The developed MATLAB-based Arithmetic Engine improves design time and reduces the verification process, but the key point is to use a unified design that combines some of the basic operations with more complex operations to reduce area and power consumption. The results indicate that using the Arithmetic Engine from a simple design to more complex systems can improve design time by reducing the verification time by up to 62%. The MATLAB-based Arithmetic Engine generates structural RTL code, a testbench, and gives the designers more control. The MATLAB-based design and verification engine uses optimized algorithms for better accuracy at a better throughput. Keywords: FPGA, High Level Synthesis, MATLAB, Optimized Hardware, Power Efficient, RTL.
  • 2. Matlab Based High Level Synthesis Engine For Area And Power Efficient Arithmetic Operations www.ijceronline.com Open Access Journal Page 2 implementation. Over the last two decades, designers were forced to find new methods to replace RTL due to improvements in SoC and shorter TTM. Because of the extensive work done in Electronics System Level Design (ESLD), HW/SW co-design of a system and High Level Synthesis (HLS)[2][4] are integrated into FPGA and ASIC design flow. Algorithm RTL Timing Translate Map Place&Route IMPLEMENT Bit File Formal Proof Logic Synthesis Figure 1. FPGA RTL level synthesis flow The next section will describe the proposed MATLAB HLS Arithmetic (MHA) Engine design and implementation. Section III and IV will focus on the error analysis and testbench generation respectively and the conclusion will describe future work and improvements. II. Mha Engine RTL description of a system can be implemented from a behavioral description of the system in Perl, C, Python and MATLAB. This will result in a faster verification process and shorter TTM. It is also possible to have a hybrid design where RTL blocks can be integrated with HLS [2].The HLS design flow shows that a group of algorithms that represent the whole system or parts of a system can be implemented using a high level language such as Perl, C, C++ , Java, MATLAB [2][5]. Each part in the system can be tested independently before the whole system is tested. During this testing process, the RTL testbenches may also be generated. After testing is complete, the system can be partitioned into HW and SW. This enables SW designers to join the design process during HW design; in addition, RTL can be tested by using both HW/SW together. After the verification process, the design can be implemented using FPGA synthesis tools.The integration of HLS into FPGA design flow is shown in Figure 2. MATLAB Algorithm High Level Synthesis (HLS) Timing Translate Map Place&Route IMPLEMENT Bit File Formal Proof RTL MHA Engine Figure 2. FPGA high level synthesis flow with MHA Engine Since the early days of VLSI design, application-specific hardware has been used for optimal implementation of algorithms. This approach is considered the fastest design scheme but is also the most area consuming system due to the inherently redundant nature of a design that only computes one operation. However, there are other possible designs for DSP implementations that can be used for two or more operations [1][3]. This design approach consists of processing blocks that can compute multiple operations using dedicated hardware designed for a particular cluster of operations. An improved design approach should exploit the redundancy and common elements that exist among the sub-blocks. This would result in shared building blocks and dramatically reduced hardware requirements.
  • 3. Matlab Based High Level Synthesis Engine For Area And Power Efficient Arithmetic Operations www.ijceronline.com Open Access Journal Page 3 The proposed design focuses on designing a large system that will be faster with a design principle similar to HLS, asexplained above. The main work focuses on multi-purpose, reused hardware structures that produce a unified, area efficient reconfigurable system. This design can reduce the area by 64% [2][[6]. The components of the MHA Engineare shown in Figure 3. Arithmetic Operation . Addition/Subtraction . Multiplication . Division . Square Root . Inverse Square Root . Sin(θ), Cos(θ), Tan(θ), Cot(θ) . Sinh(θ), Cosh(θ), Tanh(θ) . ex ControlI/O MHE Engine Figure 3. MATLAB Based HLS Arithmetic Engine (MHA) block The MHA block has three important principles:  Compute required arithmetic operations  Customized range and accuracy  Generate an area-efficient, fast system for low power applications The MHA accepts inputs from the user via two GUIs to make it more user friendly and efficient. The “Main” GUI that is shown in Figure 4 below includes the following sections:  FPGA or ASIC support  Vendor based IP Core support  Project Name (default is c:MHAMHA)  Top Module Name (default is MHA)  Language – Verilog or VHDL (design and verifications – current system only supports Verilog HDL)  Rounding - Truncation or RNE (Rounding cannot be done without a selection)  Number system – Fixed or Floating Point (Fixed point up to 64-bit - current system only supports Fixed Point Number system) Figure 4. Main GUI for MHA Engine  Signed or unsigned number systems  Target – Frequency and throughput  Area or speed based optimization
  • 4. Matlab Based High Level Synthesis Engine For Area And Power Efficient Arithmetic Operations www.ijceronline.com Open Access Journal Page 4  Testbench generation o Automated testbench with MATLAB o Modelsim .do file for fast automation o Automated testbench file for Modelsim o Error comparison with MATLAB o User defined test data option The “Arithmetic” tab shown in Figure 5 has the following sections:  Basic arithmetic operations o Addition/Subtraction (Area or speed optimized based on Ripple-Carry Adder (RCA) or Carry-Lookahead Adder (CLA)) o Multiplication (Array or Booth multiplier)  Advanced arithmetic operations o Division (Newton-Raphson, Goldschmidt, or CORDIC) o Square Root (Newton-Raphson, Goldschmidt, or CORDIC) o Inverse Square Root (Newton-Raphson, Goldschmidt, or CORDIC)  Elementary functions o Trigonometric functions – sine, cosine, tangent, and cotangent (table method, CORDIC or polynomial based design) o Hyperbolic functions – sinh, cosh, tanh (table method, CORDIC or polynomial based design) o Exponential function – exp(x) (table method, CORDIC or polynomial based design) Figure 5. Arithmetic operations GUI The MHA uses a bottom-up design process that starts with the elementary functions and then moves to the simplest arithmetic operations such as multiplication and addition. This is shown in Figure 6 below. This design flow contains the following procedure: First, selection of elementary functions [7], selection of basic arithmetic operations, and generation of area efficient hardware for FPGA and VLSI. There are 2-64 bit selections that are suitable for a vast variety of applications with the requested precision. The section’s addition and multiplications are used based on the previous designs. Division, inverse square root and square roots are designed based on the same architecture, and the modified design reduces the area by 64% [8]. Next, the CORDIC [9] [10] or polynomial methods [4] are used to calculate elementary functions [7] [11]. This area- efficient design is optimized for speed by implementing a smart control system. For performance evaluation and synthesis are implemented with Xilinx FPGAs [12] and Microwind [13] VLSI design tool.
  • 5. Matlab Based High Level Synthesis Engine For Area And Power Efficient Arithmetic Operations www.ijceronline.com Open Access Journal Page 5 Arithmetic Block Input Parameters Elementary Functions Arithmetic HDL Testbench Precision Figure 6. The MHA design flow III. Error Analysis When designing hardware with many arithmetic operations, one of the most important objectives is to produce results that have a minimal absolute and average computation error. Arithmetic operations in digital systems generally introduce three types of errors: number representation, rounding, and algorithmic or design error [14][15].The exact representation of some numbers or events in radix-n may not be possible due to the limitations in ADCs, the sampling rate, and the number of available bits. In addition, many numbers cannot be converted from radix-n to radix-m without an error. For example, number 0.1 and 0.2 in radix-10 cannot be represented in radix-2 without an error. This error can be reduced by increasing the number of bits. The reduction of this error with respect to the number of bits is shown in Figure 7. Figure 7. Error representation of number radix conversion Figure 8 shows the error generated for radix-10 to radix-2 conversion of 128 fractional numbers (fractional part of 30 bits). Figure8. Random 128-number conversion error from Radix-10 to Radix-2
  • 6. Matlab Based High Level Synthesis Engine For Area And Power Efficient Arithmetic Operations www.ijceronline.com Open Access Journal Page 6 During and after the calculation of certain arithmetic operations, the total number of bits may exceed the number of bits available; these values need to be rounded. For example, multiplying two n-bit numbers produces a product of 2n-bit and this result may need to be represented with n-bit.If there were more multiplications on the design path, the number of bits would increase in a linear fashion. To prevent this, each multiplier output needs to be rounded. There are a few ways to implement rounding in hardware, with the most commonly used methods being round to the nearest even, round towards zero (truncation), round down (floor), round up (ceiling) and round away from zero [14]. In this section, truncation (TRA)[14] and round to the nearest even (RNE)[14] schemes are compared. During these rounding operations, an error value is introduced. The RNE and TRA and their error values are shown in Table 1. Table 1. RNE and TRA Number RNE TRA Rounded Value Error Rounded Value Error X0.00 X0. 0.00 X0. 0.00 X0.01 X0. 0.25 X0. 0.25 X0.10 X0. 0.50 X0. 0.50 X0.11 X0.+ulp -0.25 X0. 0.75 X1.00 X1. 0.00 X1. 0.00 X1.01 X1. 0.25 X1. 0.25 X1.10 X1.+ulp -0.50 X1. 0.50 X1.11 X1.+ulp -0.25 X1. 0.75 Total --- 0 --- 3.00 Figure 9 shows advantage of RNE over TRA when average error is considered. The requested number of precision and selected rounding scheme can affect the size of the hardware and overall speed and throughput. Users can change the selected accuracy to see area and throughput estimates simultaneously without increasing the overall design time. This will make it possible to select the optimal design for synthesis. The MHA Engine can create hardware for precision based on increase the bit size during the mid-operation and apply rounding before the output stage. Figure 9. 32-Bit to 16-Bit Rounding Errors with RNE and TRA Iv. Testbench Generation One of the most important and complicated sections of the MHAEngine is generation of the testbench files and error checking using MATLAB and Modelsim. Before the RTL code is synthesized, it can be tested using a testbench that is created using MATLAB and Modelsim. The testbench generation and error checking block diagram is shown in Figure 10 below. MODELSIM USER CONSTRAINTS MATLAB VERILOG TESTBENCH RTL DUT RESULTS (text file) Verification and Error Files Figure 10. MATLAB and Modelsim flow
  • 7. Matlab Based High Level Synthesis Engine For Area And Power Efficient Arithmetic Operations www.ijceronline.com Open Access Journal Page 7 After generation of the design and testbench files,the user defined test vectors need to be generated. The first step is to generate positive random numbers [0,1]. These numbers must be converted into positive and negative numbers based on signed numbers. To generate random test values and test results, the following procedure is followed:  Get the user defined testvector number n.  Generate random test vectors (T{n})  Generate random binary numbers using MATLAB o For fixed-point signed numbers: T_Bin=dec2bin(T*2^n,n) o For fixed-point unsigned numbers: T_Bin=dec2bin(T*2^(n-i),n)  Generate the Modelsim testbench file and get results  Compare results using MATLAB After generation of the design, testbench, and test vector files, the next step is to generate a Modelsim tcl .do file that can be transferred into Modelsim all together. The .do file will generate the project file and will import all design files, including testbench, into Modelsim. This will run all files and generate the results as a text file. Once Modelsim-generated results are imported into MATLAB, correct operation and error analysis needs to be performed. An important issue which needs to be addressed during the verification process is working with negative fixed-point numbers in MATLAB. It is important because it does not convert negative binary numbers and binary floating numbers to a decimal number. This problem is addressed using the MATLAB codes given in Figure 11. Figure 11. Signed binary to decimal conversion V. Conclusion An area efficient, MATLAB based HLS engine for arithmeticoperations is designed for low power and high-speed applications. The MHA Engine decreases design system time and verification by up to 64% without compromising speed and efficiency. The MHA Engine uses a smart control system that is optimized based on the desired operations. The MHA Engine is a bridge between RTL and HLS. It uses RTL-based basic blocks to design most complicated arithmetic operations using structural model design and HLS-style fast and optimized verification. Any designed system can be reconfigured at any time in any way in MHA Engine without going through the same design and verification hassle. MATLAB-based verification makes it possible to use all the features of MATLAB for faster and more efficient verification. The MHA Engine can be easily reconfigurable to systems available at any level, due to changes in the computer system and software. As explained above, this system generates area efficient fast arithmetic and elementary functionsthat can be used over a wide area of applications in DSP, image processing, and communication systems. It can be used for FFT, DCT and DWT calculations and Chirplet transforms [16][17][18]. It can also be very important for educational institutions in order to test their systems using the verification testbench that, when desired, works as an independent design tool. Overall, this work will forge the way for those who need to make sudden changes in their systems and need fast verification. They can adopt and apply any changes using the MHA Engine or generate similar systems for faster design and verification. In addition, because the MHA Engine generated code is designed structurally, code can be changed easily at any level, if so desired. Future workwill integrate VHDL code and IEEE 754 floating point numbers (both single and double precision) implementation. Another future goal is to make this platform totally open source by using only Iverilog and replacing MATLAB with Octave.
  • 8. Matlab Based High Level Synthesis Engine For Area And Power Efficient Arithmetic Operations www.ijceronline.com Open Access Journal Page 8 References [1]. Hendry, D.C., and A.A. Duncan. "Area Efficient DSP Datapath Synthesis." Design Automation Conference (1995): 130-135. [2]. Aslan, S., Oruklu, E., and Saniie J., “A high-level synthesis and verification tool for fixed to floating point conversion”, IEEE International Midwest Symposium on Circuits and Systems, 2012, Pages, 908-911. [3]. Andrieux, J., M. Feix, G. Mourgues, P. Bertrand, B. Izrar, and V. Nguyen. "Optimum Smoothing of the Wigner Ville Distribution." IEEE Transactions on Acoustics, Speech, and Signal Processing 36.5(1987): 764-769. [4]. Kilts, S. Advanced FPGA Design Architecture, Implementation, and Optimizations. New York: Wiley Inter-Science, 2007. [5]. Chen, W. The VLSI Handbook. Boca Raton: CRC Publisher, 2007. [6]. Dehon, A., and S. Hauck. Reconfigurable Computing The Theory and Practice of FPGA-Based Computing. Burlington, Massachusetts: Elsevier, 2008. [7]. Walther, J.S. "A unified Algortihm for Elementary Functions." American Federation of Information Processing Societies Joint Computer Conferences (1971): 379-385. [8]. Oruklu, E., J. Saniie, and S. Aslan. "Realization of Area Efficient QR Factorization using Unified Diivision, Square Root and Inverse Sqaure Root Hardware." IEEE Electro/Information Technology (2009): 245-250. [9] Volder, J. "The CORDIC Trigonometric Computing Technique." IEEE Transactions Electronic Computers 8.3 (1959): 330-334. [9]. 10] Striling, W. C., and T. K. Moon. Mathematical Methods and algorithms for Signal Processing. New Jersey: Prentice Hall, 2000. [10]. Xilinx. (2016), http://www.xilinx.com/ [11]. Microwind (2106) http://microwind.net/ [12]. Stine, J. E. "Digital Computer Arithmetic Datapath Design using Verilog HDL." Digital Computer Arithmetic Datapath Design using Verilog HDL. Norwell, Massachusetts: Kluwer Academic Publishing, 2004. [13]. Teukolsky, S. A., W. T. Vetterling, B. P. Flannery, and W. H. Press. "Numerical Recipes: The Art of Scientific Computing." Numerical Recipes: The Art of Scientific Computing, 3rd ed. New York, New York: Cambridge University Press, 2007. [14]. Omar, J., E. E. Swartzlander Jr., and M. J. Schulte. "Optimal Initial Approximations for the Newton-Raphson Division Algorithm." Springer-Verlag Journal of Computing 53.3-4 (1994): 233-242. [15]. Seidel, P-M, W. E. Ferguson, and G. Even. "A Parametric Error Analysis of Goldschmidt’s Division Algorithm." Journal of Computer and System Sciences 70.1 (2005): 118-139. [16]. Chunduri, K. C. "Implementation of Adaptive Filter Structures on a Fixed Point Signal Processor for Acoustical Noise Reduction" (2006).