Digital Logic and Electronic Systems Design Company

Floating Point Verilog RTL Library User Guide

Pulse Logic

Document version: 1.0

www.pulselogic.com.pl

Document date: May 2009

e-mail: [email protected]

Floating Point Verilog RTL Library

Table of Contents Introduction..........................................................................................................................................3 Floating Point Adder/Subtractor – Float32Add....................................................................................4 Floating Point Multiplier – Float32Mul...............................................................................................6 Floating Point Divider – Float32Div....................................................................................................8 Floating Point Square Root – Float32Sqrt..........................................................................................10 Know Problems and Solutions...........................................................................................................12 License................................................................................................................................................13

Pulse Logic

Page: 2

Floating Point Verilog RTL Library

Introduction The IEEE has standardized the computer representation for binary floating-point numbers in IEEE 754 standard. It is the most commonly used representation in modern computing machines. The standard defines at least five different formats for floating point numbers. However, the following two seems to be most popular: •

Single precision. This is a binary format that occupies 32 bits (4 bytes)



Double precision. This is a binary format that occupies 64 bits (8 bytes)

This library provides arithmetic operations for single precision floating point numbers represented using 32 bits. Floating point numbers are represented in a binary form using the following fields: •

sign bit



exponent field



significant (mantissa) field

The following figure shows structure of single precision floating point number: SIGN 31 30

EXPONENT

MANTISSA 23 22

0

Figure 1. Single precision numbers. Exact result of a floating point operation may need more mantissa bits to be stored. In such case the result must be rounded before it can be stored in the above defined fields. The IEEE standards defines several rounding schemes. However, rounding to nearest seams to be most popular and it is used in this library. Floating point operations can run into problems like illegal operation, underflow or overflow. Such exceptions are signaled using special codes like INF (infinity) or NAN (Not A Number). The exceptions are fully supported in this library. Result of an operation can be also to small to be represented correctly in IEEE floating point format. In such case hardware changes representation and provides result as denormalized number. Denormalized numbers are fully supported in this library. They are also accepted as valid operands to arithmetic operations.

Pulse Logic

Page: 3

Floating Point Verilog RTL Library

Floating Point Adder/Subtractor – Float32Add The Float32Add core is provided for free in the form of Verilog obfuscated code. The code is difficult to read because of removed text formatting and identifiers replaced with automatically generated strings. However, the code is fully functional and should work correctly in any simulation and synthesis tool. You can simulate it or synthesize to you FPGA or ASIC technology. Modifications of the code are extremely difficult. In case of any bugs please send a report to: [email protected] Features: •

Technology independent Verilog RTL design (obfuscated code)



Synchronous design optimized for area and performance



IEEE 754 single precision compliant



Exceptions supported



Denormal numbers supported including denormal arguments



Rounding to nearest supported



Output latency: 12 clock cycles

The following picture shows ports of Float32Add module ports:

leftArg [31 : 0]

sum [31: 0]

rightArg [31 : 0]

status [2 : 0]

loadArgs

busy

addSub

Float32Add

nRST CLK

Figure 2. Float32Add ports

Pulse Logic

Page: 4

Floating Point Verilog RTL Library

Port

Direction

CLK

IN

Global clock. Rising edge active.

nRST

IN

Global reset. Active low.

leftArg [31 : 0]

IN

Left argument. Bit 31 - Sign, Bits 30:23 – Exponent, Bits 22:0 - Mantissa.

rightArg [31 : 0]

IN

Right argument. Bit 31 - Sign, Bits 30:23 - Exponent, Bits 22:0 - Mantissa.

addSub

IN

Operation selection. 1 - Addition, 0 – Subtraction.

loadArgs

IN

Arguments load strobe. Active high.

status [2 : 0]

OUT

Status output. Bit 2 - Not a Number, Bit 1 - Infinity, Bit 0 - Denormal.

busy

OUT

Busy output - high means performing calculations, low means result ready.

sum [31 : 0]

Description

OUT Result. Valid if busy bit is low. Status bits denote exceptions. Table 1. Float32Add ports description

Both input arguments should be applied to leftArg and rightArg inputs along with loadArgs strobe on rising clock edge. The Float32Add module loads the arguments on the next rising clock edge and sets busy flag. High state on the busy output means that calculations are performed and result is not ready. Result is available to read when busy is low. At the same time you can load new arguments. Please, take a look at the waveform below:

Figure 3. Float32Add simulation waveform

Pulse Logic

Page: 5

Floating Point Verilog RTL Library

Floating Point Multiplier – Float32Mul The Float32Mul core is provided for free in the form of Verilog obfuscated code. The code is difficult to read because of removed text formatting and identifiers replaced with automatically generated strings. However, the code is fully functional and should work correctly in any simulation and synthesis tool. You can simulate it or synthesize to you FPGA or ASIC technology. Modifications of the code are extremely difficult. In case of any bugs please send a report to: [email protected] Features: •

Technology independent Verilog RTL design (obfuscated code)



Synchronous design optimized for area and performance



IEEE 754 single precision compliant



Exceptions supported



Denormal numbers supported including denormal arguments



Rounding to nearest supported



Output latency: 10 clock cycles

The following picture shows ports of Float32Mul module ports:

leftArg [31 : 0]

product [31: 0]

rightArg [31 : 0]

status [2 : 0]

loadArgs

Float32Mul

busy

nRST CLK

Figure 4. Float32Mul ports

Pulse Logic

Page: 6

Floating Point Verilog RTL Library

Port

Direction

CLK

IN

Global clock. Rising edge active.

nRST

IN

Global reset. Active low.

leftArg [31 : 0]

IN

Left argument. Bit 31 - Sign, Bits 30:23 – Exponent, Bits 22:0 - Mantissa.

rightArg [31 : 0]

IN

Right argument. Bit 31 - Sign, Bits 30:23 - Exponent, Bits 22:0 - Mantissa.

loadArgs

IN

Arguments load strobe. Active high.

status [2 : 0]

OUT

Status output. Bit 2 - Not a Number, Bit 1 - Infinity, Bit 0 - Denormal.

busy

OUT

Busy output - high means performing calculations, low means result ready.

product [31 : 0]

Description

OUT Result. Valid if busy bit is low. Status bits denote exceptions. Table 2. Float32Mul ports description

Both input arguments should be applied to leftArg and rightArg inputs along with loadArgs strobe on rising clock edge. The Float32Mul module loads the arguments on the next rising clock edge and sets busy flag. High state on busy output means that calculations are performed and result is not ready. Result is available to read when busy is low. At the same time you can load new arguments. Please, take a look at the waveform below:

Figure 5. Float32Mul simulation waveform

Pulse Logic

Page: 7

Floating Point Verilog RTL Library

Floating Point Divider – Float32Div The Float32Div core is provided for free in the form of Verilog obfuscated code. The code is difficult to read because of removed text formatting and identifiers replaced with automatically generated strings. However, the code is fully functional and should work correctly in any simulation and synthesis tool. You can simulate it or synthesize to you FPGA or ASIC technology. Modifications of the code are extremely difficult. In case of any bugs please send a report to: [email protected] Features: •

Technology independent Verilog RTL design (obfuscated code)



Synchronous design optimized for area and performance



IEEE 754 single precision compliant



Exceptions supported



Denormal numbers supported including denormal arguments



Rounding to nearest supported



Output latency: 67 clock cycles

The following picture shows ports of Float32Div module ports:

leftArg [31 : 0]

quotient [31: 0]

rightArg [31 : 0]

status [2 : 0]

loadArgs

Float32Div

busy

nRST CLK Figure 6. Float32Div ports

Pulse Logic

Page: 8

Floating Point Verilog RTL Library

Port

Direction

CLK

IN

Global clock. Rising edge active.

nRST

IN

Global reset. Active low.

leftArg [31 : 0]

IN

Left argument. Bit 31 - Sign, Bits 30:23 – Exponent, Bits 22:0 - Mantissa.

rightArg [31 : 0]

IN

Right argument. Bit 31 - Sign, Bits 30:23 - Exponent, Bits 22:0 - Mantissa.

loadArgs

IN

Arguments load strobe. Active high.

status [2 : 0]

OUT

Status output. Bit 2 - Not a Number, Bit 1 - Infinity, Bit 0 - Denormal.

busy

OUT

Busy output - high means performing calculations, low means result ready.

quotient [31 : 0]

Description

OUT Result. Valid if busy bit is low. Status bits denote exceptions. Table 3. Float32Div ports description

Both input arguments should be applied to leftArg and rightArg inputs along with loadArgs strobe on rising clock edge. The Float32Div module loads the arguments on the next rising clock edge and sets busy flag. High state on busy output means that calculations are performed and result is not ready. Result is available to read when busy is low. At the same time you can load new arguments. Please, take a look at the waveform below:

Figure 7. Float32Div simulation waveform

Pulse Logic

Page: 9

Floating Point Verilog RTL Library

Floating Point Square Root – Float32Sqrt The Float32Sqrt core is provided for free in the form of Verilog obfuscated code. The code is difficult to read because of removed text formatting and identifiers replaced with automatically generated strings. However, the code is fully functional and should work correctly in any simulation and synthesis tool. You can simulate it or synthesize to you FPGA or ASIC technology. Modifications of the code are extremely difficult. In case of any bugs please send a report to: [email protected] Features: •

Technology independent Verilog RTL design (obfuscated code)



Synchronous design optimized for area and performance



IEEE 754 single precision compliant



Exceptions supported



Denormal numbers supported including denormal arguments



Rounding to nearest supported



Output latency: •

894 clock cycles – version with radix two division



257 clock cycles – version with high radix division

The square root core is provided in two versions. The two version differ in the algorithm used for integers division. The first version uses standard radix two division algorithm. It is resources efficient core but needs significant number of clock cycles to calculate result. The second version uses parallelized division algorithm. It needs less clock cycles but utilizes significantly more hardware resources. The following picture shows ports of Float32Sqrt module ports:

Arg [31 : 0]

sqrt [31: 0]

loadArgs

status [2 : 0]

nRST

Float32Sqrt

busy

CLK Figure 8. Float32Sqrt ports

Pulse Logic

Page: 10

Floating Point Verilog RTL Library

Port

Direction

CLK

IN

Global clock. Rising edge active.

nRST

IN

Global reset. Active low.

arg [31 : 0]

IN

Input argument. Bit 31 - Sign, Bits 30:23 – Exponent, Bits 22:0 - Mantissa.

loadArgs

IN

Argument load strobe. Active high.

status [2 : 0]

OUT

Status output. Bit 2 - Not a Number, Bit 1 - Infinity, Bit 0 - Denormal.

busy

OUT

Busy output - high means performing calculations, low means result ready.

sqrt [31 : 0]

Description

OUT Result. Valid if busy bit is low. Status bits denote exceptions. Table 4. Float32Sqrt ports description

Input argument should be applied to arg input along with loadArgs strobe on rising clock edge. The Float32Sqrt module loads the argument on the next rising clock edge and sets busy flag. High state on busy output means that calculations are performed and result is not ready. Result is available to read when busy is low. At the same time you can load new argument. Please, take a look at the waveform below:

Figure 9. Float32Sqrt simulation waveform

Pulse Logic

Page: 11

Floating Point Verilog RTL Library

Know Problems and Solutions 1. Differences detected while comparing Verilog simulation results against C/C++ program executed on PC machine. Solution: C/C++ compilers provide float type compatible with IEEE 754 single precision numbers. Variables of the float type are 32 bits wide as defined in IEEE standard. However, PC machines perform all calculations with full precision numbers, which are 64 bits wide. The 64-bits results are rounded to 32-bits once the operation is finished. Results of an arithmetic operation performed on 64-bits and rounded to 32 bits can be different than result of an operation performed on 32 bits. In case of single addition or multiplication mantissa may differ +/- 1. If you have more operations performed serially the difference may be larger. You should perform calculations on 32-bits to get results matching Verilog simulation. You can use VHDL behavioral models provided in IEEE library in the floating point package. 2. Result of a single addition or multiplication differs much more than +/-1 while comparing Verilog simulation results against C/C++. Solution: Check if the difference is encountered with NaN (Not a Number). IEEE standard defines NaN as a number with all bits set to one in the exponent and non zero mantissa. The contents of the mantissa is not important but must be different than zero. Usually, mantissa MSB bit is set to one while signaling NaN. Remaining bits of mantissa are left untouched and may contain values from previous calculations. Check always status flags and do not compare NaNs if encountered.

Pulse Logic

Page: 12

Floating Point Verilog RTL Library

License This library is provided under modified BSD license. The advertising clause was removed from the original BSD license: Copyright (C) 2009 Pulse Logic [email protected] This library may be used and distributed without restriction provided that this copyright statement is not removed from the file and that any derivative work contains the original copyright notice and the associated disclaimer. THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Pulse Logic

Page: 13

Floating Point Verilog RTL Library User Guide - Pulse Logic

Differences detected while comparing Verilog simulation results against C/C++ ... THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR ... OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER ...

279KB Sizes 0 Downloads 144 Views

Recommend Documents

Floating-Point Comparison
16 Oct 2008 - In Java, JUnit overloads Assert.assertEquals for floating-point types: assertEquals(float expected, float actual, float delta);. assertEquals(double expected, double actual, double delta);. An example (in C++): TEST(SquareRootTest, Corr

Floating-Point Comparison
16 Oct 2008 - In Java, JUnit overloads Assert.assertEquals for floating-point types: assertEquals(float expected, float actual, float delta);. assertEquals(double expected, double actual, double delta);. An example (in C++): TEST(SquareRootTest, Corr

Floating-Point Comparison
Oct 16, 2008 - More information, discussion, and archives: http://googletesting.blogspot.com. Copyright © 2007 Google, Inc. Licensed under a Creative ...

rtl logic family pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. rtl logic family ...

rtl logic family pdf
There was a problem loading more pages. rtl logic family pdf. rtl logic family pdf. Open. Extract. Open with. Sign In. Main menu. Displaying rtl logic family pdf.

Practical Floating-point Divergence Detection
ing 3D printing, computer gaming, mesh generation, robot motion planning), ..... contract is a comparison between signatures of outputs computed under reals ..... platforms. Their targeting problem is similar to the problem described in [22], and it

Unsafe Floating-point to Unsigned Integer Casting ...
and offer simple practical solutions based on static typing. 1 Supported in part by NSF CCF 7298529 and 1346756. 2 Supported in part ...... [17] Liu, W., B. Schmidt, G. Voss and W. Müller-Wittig, Accelerating molecular dynamics simulations using gra

The IBM System/360 Model 91: Floating-point Execution ... - CiteSeerX
performance computers are floating-point oriented. There- fore, the prime concern in designing the floating-point execution unit is to develop an overall organization which will match the performance of the instruction unit. How- ever, the execution

Design of the Floating-Point Adder Supporting the Format Conversion ...
Aug 8, 2002 - converted into absolute number. Then, as the normal- ization step, the leading zero is calculated in the lead- ing zero counter[9] for the absolute ...

Stochastic Optimization of Floating-Point Programs with ... - GitHub
preserve floating point programs almost as written at the expense of efficiency. ... to between 1- and 64-bits of floating-point precision, and are up to. 6 times faster than the ...... for the exp() kernel which trade precision for shorter code and.

Design of the Floating-Point Adder Supporting the Format Conversion ...
Aug 8, 2002 - Conference on ASICs, pp.223–226, Aug. 2000. [5] A.B. Smith, N. Burgess, S. Lefrere, and C.C. Lim, “Re- duced latency IEEE floating-point standard adder architec- tures,” Proc. IEEE 14th Symposium on Computer Arith- metic, pp.35–

Designing of High-Performance Floating Point Processing ... - IJRIT
Abstract— This paper presents the design and evaluation of two new processing ... arithmetic processing units which allow modular system design in order to ...

The IBM System/360 Model 91: Floating-point ... - Semantic Scholar
J. G. Earle. R. E. Goldschmidt. D. M. Powers ..... COMMON R. PT. ET%:. RES STAT 1 ...... as soon as R X D is gated into CSA-C, the next multiply,. R X N, can be ...

A Distillation Algorithm for Floating-point Summation
|e| = |s − fl(s)| ≤ n. ∑ i=1. |xi|(n + 1 − i)η. (2). The main conclusion from this bound is that data values used at the beginning of the summation (x1, x2, x3, ...) have ...

Designing of High-Performance Floating Point Processing ... - IJRIT
Digital signal processing and multimedia applications require large amounts of data, real-time processing ability, and very high computational power. As a result ...

Printing Floating-Point Numbers Quickly and Accurately ...
Jun 5, 2010 - we present a custom floating-point data-type which will be used in all remaining ..... exponent, a 32-bit signed integer is by far big enough. ... the precise multiplication we will use the “rounded” symbol for this operation: ˜r .

The IBM System/360 Model 91: Floating-point ... - Semantic Scholar
execution of instructions has led to the design of multiple execution units linked .... complex; the data flow path has fewer logic levels and re- requires less ...

Design of Double Precision IEEE Floating Point Adder - International ...
This work presents a novel technique to implement a double precision IEEE floating-point adder that can complete the operation within two clock cycles. The proposed technique has exhibited ... resources required while execution of the algorithm, at t