implementation of 64 bit signed and unsigned ... -

Viewer
Transcript

IMPLEMENTATION OF 64 BIT SIGNED AND UNSIGNED MULTIPLIER ON ALTERA QUARTUS-II

PROJECT REPORT (Phase – II)

Submitted by GEETHANJALI D Register No: 1012208006

in partial fulfillment for the award of the degree of

MASTER OF ENGINEERING in VLSI DESIGN

SONA COLLEGE OF TECHNOLOGY (AUTONOMOUS) SALEM – 636 005 ANNA UNIVERSITY: CHENNAI 600 025 JUNE 2014

SONA COLLEGE OF TECHNOLOGY, SALEM (AUTONOMOUS) DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

The project entitled ―IMPLEMENTATION OF 64 BIT SIGNED AND UNSIGNED MULTIPLIER ON ALTERA QUARTUS-II‖ submitted by GEETHANJALI D (Reg. No 1012208006) is completed and may be accepted for being evaluated.

Date: Dr.K.R.KASHWAN (SUPERVISOR)

ANNA UNIVERSITY: CHENNAI 600 025

Certified that this project report “IMPLEMENTATION OF 64 BIT SIGNED AND UNSIGNED MULTIPLIER ON ALTERA QUARTUSII” is the bonafide work of “GEETHANJALI D” who carried out the project work under my supervision.

SIGNATURE

SIGNATURE

Dr.K.R.KASHWAN

Dr.K.R.KASHWAN

HEAD OF THE DEPARTMENT

SUPERVISOR

Professor & Dean/HOD,

Professor & Dean/HOD,

Department of ECE(PG),

Department of ECE(PG),

Sona College Of Technology,

Sona College Of Technology,

Salem-636 005

Salem-636 005

Submitted for the project Viva-Voce examination held on

Internal Examiner

External Examiner

ACKNOWLEDGEMENT Behind every achievement lies an unfathomable sea of gratitude to those who actuated it, without them it would never have into existence. To them I lay the word of gratitude imprinted within me. I wish to express my sincere thanks to our respected Principal, Dr.V.JAYAPRAKASH, Ph.D, for all the blessing and help provided during the period of project work. I wish to express my sincere thanks Dr.K.R.KASHWAN, Ph.D, Professor and Head of the Department of Electronics and Communication Engineering (PG), for the continuous help over the period of project work. I am indebted to my internal guide Dr.K.R.KASHWAN, Ph.D, Professor and Head of the Department of Electronics and Communication Engineering (PG) for his immense help, timely guidance unstrained attention and constant inspiration and creative ideas over the period of project work. I also render grateful thanks to my Project Coordinator Mrs.V.MEENAKSHI, M.E., Assistant Professor, Department of Electronics and Communication Engineering (PG) for her help and excellent cooperation in leading my project work. I would like to extend my warmest thanks to all our Lab Technicians for helping me in this venture. Finally, I take this opportunity to extend my deep appreciation to my family and friends, for all that they meant to me during the crucial times of the completion of my project.

DATE:

GEETHANJALI D

ABSTRACT The comparison of the carry look-ahead adder (CLAA) based 64-bit signed and unsigned integer multiplier and the carry select adder (CSLA) based 64-bit signed and unsigned integer multiplier is made here. The multiplier multiplies 64 bit values and gives a product term of 128 bit values. By using the adder based multipliers, efficiency is increased. In this design, power, area and time delays between binary adders such as, Carry select and carry look-ahead adder are analyzed. To show the best adder for the choice of implementation these adders are implemented in signed and unsigned multipliers applied for FFT design.

TABLE OF CONTENTS CHAPTER NO.

TITLE

PAGE

NO.

1

2

ABSTRACT

v

LIST OF TABLES

ix

LIST OF FIGURES

x

LIST OF ABBREVIATIONS

xii

INTRODUCTION

1

1.1 GENERAL

1

1.1.1CHALLENGES

1

1.1.2 LOW POWER VLSI

3

1.2MULTIPLIER

4

1.3 ADDERS

5

1.4 PROBLEM STATEMENT

8

SYSTEM ANALYSIS

9

2.1 ANALYSIS OF CLASSIFIED BINARY ADDER ARCHITECTURE

9

2.1.1 RIPPLE CARRY ADDER

9

2.1.2 CONDITIONAL SUM ADDER

10

2.1.3ANALYSIS OF ADDERS

12

2.2HIGH SPEED MULTILIER

12

2.2.1 BOOTH MULTIPLIER

13

2.2.2BOOTH MODIFIED ALGORITHM

14

2.3MULTIPLIER ARCHITECTURE FOR LOW POWER

16

2.3.1WALLACE TREE MULTIPLIER

16

3

4

5

6

2.3.2ARRAY MULTIPLIER

17

2.4UNSIGNED MULTIPLIER

18

2.4.1MULTIPLIER FOR UNSIGNED DATA

18

2.4.2 MULTIPLIER ALGORITHM

19

2.2 PROPOSED SYSTEM

20

SYSTEM SPECIFICATION

21

3.1SOFTWARE SPECIFICATION

21

3.2 HARDWARE SPECIFICATION

21

SOFTWARE SPECIFICATION

22

4.1 MODELSIM

22

4.2 QUARTUS-II

23

4.3 VERILOG

23

4.4 ALTERA FPGA

24

PROJECT DESCRIPTION

25

5.1OVERVIEW OF THE PROJECT

25

5.2BLOCK DIAGRAM

25

5.3BAUGH WOOLEY MULTIPLIER

27

5.4FULL ADDER

27

5.5CARRY SELECT LEVEL ADDER

28

5.6CARRY LOOK AHEAD ADDER

30

5.7SIGNED AND UNSIGNED BIT

31

RESULTS AND DISCUSSIONS

34

6.1SIMULATION RESULT FOR UNSIGNED BIT 34 MULTIPLIER

6.2 SIMULATION RESULT FOR SIGNED BIT 35 MULTIPLIER 6.3 SIMULATION RESULT FOR FFT INPUT

36 6.3.1 SIMULATION RESULT FOR FFT 1 STAGE 37 6.3.2 SIMULATION RESULT FOR FFT 2nd STAGE 38 6.3.3 SIMULATION RESULT FOR FFT 3rd STAGE 39 th 6.3.4 SIMULATION RESULT FOR FFT 4 STAGE 40 6.3.5 SIMULATION RESULT FOR FFT OUTPUT 41 ST

FOR CSLA 6.3.6 SIMULATION RESULT FOR FFT OUTPUT 42 FOR CLAA 6.3.7 IMPLEMENTATION 6.4 ANALYSIS AND SYNTHESIS RESULTS

7 8

9

44

8.1SOURCE CODE

45 47 48 48

8.2CONFERENCE CERTIFICATE

50

REFERENCES

51

CONCLUSION APPENDIX

LIST OF TABLE

TABLE NO NO

NAME OF TABLE

PAGE

1

THEORETICAL COMPARISON OF AREA OCCUPIED

12

2

THEORETICAL COMPARISON OF TIME REQUIRED

13

3

THEORETICAL COMPARISON OF AREA DELAY PRODUCT 13

4

VALUE OF BITS IN SIGNED AND SIGNED

34

5

COMPARISON TABLE

46

LIST OF FIGURE FIGURE NO.

NAME OF FIGURE

PAGE NO

1

A 4 BIT RCA

9

2

CONDITIONAL SUM ADDER

11

3

THE MODFIED BOOTH ALGORITHM

15

4

ARCHITECTURE OF ARRAY AND WALLACE TREE

16

5

ARRAY MULTIPLIER

17

6

MULTIPLIER FOR UNSIGNED DATA

19

7

MULTIPLIER OF 2n BIT VALUES

20

8

BLOCK DIAGRAM

27

9

RECONFIGURABLE CELL

27

10

FULL ADDER

29

11

CARRY SELECT LEVEL ADDER

30

12

CARRY LOOK AHEAD ADDER

31

13

UNSIGNED BIT MULTIPLIER

35

14

SIGNED BIT MULTIPLIER

36

15

FFT INPUT

37

16

FFT 1ST STAGE

38

17

FFT 2nd STAGE

39

18

FFT 3rd STAGE

40

19

FFT 4th STAGE

41

20

FFT OUTPUT FOR CSLA

42

21

FFT OUTPUT FOR CLAA

43

22

HARDWARE IMPLEMENTATION

43

LIST OF ABBREVATION ABBREVATION

NAME

CLAA

CARRY LOOK AHEAD ADDER

CSLA

CARRY SELECT ADDER

COSA

CONDITIONAL SUM ADDER

DA

DESIGN ARCHIECTURE

DSP

DIGITAL SIGNAL PROCESSING

DRC

DESIGN RULE CHECK

FFT

FAST FOURIER TRANSFORM

FPGA

FIELD PROGRAMMABLE GATE ARRAY

LVS

LAYOUT VERSUS SCHEMATIC CHECK

MSB

MOST SIGNIFICANT BIT

PDP

POWER DELAY PRODUCT

RCA

RIPPLE CARRY ADDER

SDL

SCHEMATIC DRIVEN LAYOUT

CHAPTER 1 INTRODUCTION 1.1GENERAL Very-large-scale integration (VLSI) is the process of creating integrated circuits by combining thousands of transistors into a single chip. VLSI began in the 1970s when complex semiconductor and communication technologies were being developed. The microprocessor is a VLSI device. The first semiconductor chips held two transistors each. Subsequent advances added more and more transistors, and, as a consequence, more individual functions or systems were integrated over time. The first integrated circuits held only a few devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making it possible to fabricate one or more logic gates on a single device. Further improvements led to large-scale integration (LSI), i.e. systems with at least a thousand logic gates. Current technology has moved far past this mark and today's micro processors have many millions of gates and billions of individual transistors. At one time, there was an effort to name and calibrate various levels of large-scale integration above VLSI. Terms like ultra-large-scale integration (ULSI) were used. But the huge number of gates and transistors available on common devices has rendered such fine distinctions moot. Terms suggesting greater than VLSI levels of integration are no longer in widespread use.

1.1.1 CHALLENGES: As microprocessors become more complex due to technology scaling, microprocessor designers have encountered several challenges which force them to think beyond the design plane, and look ahead to post-silicon.  Power usage/Heat dissipation – As threshold voltages have ceased to scale with advancing process technology, dynamic power dissipation has not scaled proportionally. Maintaining logic complexity when scaling the design down only means that the power dissipation per area will go up. This has given rise

to techniques such as dynamic voltage and frequency scaling (DVFS) to minimize overall power.  Process variation – As photolithography techniques tend closer to the fundamental laws of optics, achieving high accuracy in doping concentrations and etched wires is becoming more difficult and prone to errors due to variation. Designers now must simulate across multiple fabrication process corners before a chip is certified ready for production.  Stricter design rules – Due to lithography and etch issues with scaling, design rules for lawet have become increasingly stringent. Designers must keep ever more of these rules in mind while laying out custom circuits. The overhead for custom design is now reaching a tipping point, with many design houses opting to switch to electronic design automation (EDA) tools to automate their design process.  Timing/design closure – As clock frequencies tend to scale up, designers are finding it more difficult to distribute and maintain low clock skew between these high frequency clocks across the entire chip. This has led to a rising interest in multi core and multiprocessor architectures, since an overall speedup can be obtained by lowering the clock frequency and distributing processing. First-pass success – As die sizes shrink (due to scaling), and wafer sizes go up (to lower manufacturing costs), the number of dies per wafer increases, and the complexity of making suitable photo masks goes up rapidly. A mask set for a modern technology can cost several million dollars. This non-recurring expense deters the old iterative philosophy involving several "spin-cycles" to find errors in silicon, and encourages firstpass silicon success The scaling of silicon technology has been ongoing for over forty years. We are on our way to commercializing devices having a minimum feature size of one tenth of a micron. The push for miniaturization comes from the demand for higher functionality and higher performance at a lower cost. As a result, successively higher levels of integration have been driving up the power consumption of chips. Today heat removal and power distribution are at the forefront of the problems faced by chip designers.

1.1.2 LOW POWER VLSI: The invention of the transistor, decades ago, through the year leading to the 1990’s power dissipation. Through not entirely ignored, was of little concern. Application powered by battery pocket calculator, hearing aids, implantable pacemakers, portable military equipment used by individual soldier. Low power design basically involves two concomitant tasks:  Power estimation and analysis  Power minimization These tasks need to be carried out at each of the levels in the design hierarchy, namely, the behavioral, architectural, logic, circuit and physical levels. The design of portable devices requires consideration for peak power consumption to ensure reliability and proper operation. However, the time averaged power is often more critical as it is linearly related to the battery life. There are four sources of power dissipation in digital CMOS circuits: switching power, short-circuit power, leakage power and static power. The following equation describes these four components of power: P avg = P switching + P short-circuit + P leakage + P static

(1.1)

P avg = αCLVddVsfck +IscVdd + IleakageVdd + IstaticVdd

(1.2)

Pswitching is the switching power. For a properly designed CMOS circuit, this power component usually dominates, and may account for more than 90% of the total power. α denotes the transition activity factor which is defined as the average number of power consuming transitions that is made at a node in one clock period. Vs is the voltage swing, where in most cases it is the same as the supply voltage, Vdd.

1.2 MULTIPLIER A multiplier is one of the key hardware blocks in most digital and high performance systems such as FIR filters, digital signal processors and microprocessors etc. With advances in technology, many researchers have tried and are trying to design multipliers which offer either of the following- high speed, low power consumption, regularity of layout and hence less area or even combination of them in multiplier. Thus making them suitable for various high speed, low power, and compact VLSI implementations. However area and speed are two conflicting constraints. So improving speed results always in larger areas. So here try to find out the best trade off solution among the both of them. Generally as we know multiplication goes in two basic steps. Partial product and then addition. Hence in this paper first to design different adders and compare their speed and complexity of circuit i.e. the area occupied. And then to designed Wallace tree multiplier then followed by Booth’s Wallace multiplier and have compared the speed and Power consumption in them. While comparing the adders to found out that Ripple Carry Adder had a smaller area while having lesser speed, in contrast to which Carry Select Adders are high speed but posses a larger area. And a Carry Look Ahead Adder is in between the spectrum having a proper tradeoff between time and area complexities. After designing and comparing the adders we turned to multipliers. Initially

went for Parallel Multiplier and then Wallace Tree

Multiplier. In the mean time learned that the delay amount was considerably reduced when Carry Save Adders were used in Wallace Tree applications. Then turned to Booths Multiplier and designed Radix-4 modified booth multiplier and analyzed the performance of all the multipliers. After that crooked to different methods of power optimization, of which it could only complete a few like it went for designing different recoding schemes and their corresponding partial product generator scheme. After that it designed these recorders and PP generators and found out the time delays and area covered and power consumed by each scheme. It took into consideration that since all the PP generators take a huge amount of area that require to go for simplest of the designs for them and also side by side it need to ensure that it don’t have much switching actions in the circuit .

1.3THE ADDERS Addition is the most common and often used arithmetic operation on microprocessor, digital signal processor, especially digital computers. Also, it serves as a building block for synthesis all other arithmetic operations. Therefore, regarding the efficient implementation of an arithmetic unit, the binary adder structures become a very critical hardware unit. In computer arithmetic, it looks that there exists a large number of different circuit architectures with different performance characteristics and widely used in the practice. Although many researches dealing with the binary adder structures have been done, the studies based on their comparative performance analysis are only a few. In this project, qualitative evaluations of the classified binary adder architectures are given. Among the huge member of the adders to wrote VHDL (Hardware Description Language) code for Ripple-carry, Carry-select and Carry-look ahead to emphasize the common performance properties belong to their classes. In the following section, it gives a brief description of the studied adder architectures With respect to asymptotic delay time and area complexity, the binary adder architectures can be categorized into four primary classes. They are the highest exponent term of the exact formulas, very complex for the high bit lengths of the operands. The first class consists of the very slow ripple-carry adder with the smallest area. In the second class, the carry-skip, carry-select adders with multiple levels have small area requirements and shortened computation times. From the third class, the carry-look ahead adder and from the fourth class, the parallel prefix adder represents the fastest addition schemes with the largest area computed. In digital adders, the speed of addition is limited by the time required to propagate a carry through the adder. The sum for each bit position in an elementary adder is generated sequentially only after the previous bit position has been summed and a carry propagated into the next position. The CSLA is used in many computational systems to alleviate the problem of carry propagation delay by independently generating multiple carries and then select a carry to generate the sum. The CSLA is not area efficient because it uses multiple pairs of Ripple Carry Adders (RCA) to generate partial sum and

carry by considering carry input Cin=0 and Cin=1, then the final sum and carry are selected by the multiplexers (mux). The basic idea of this work is to use Binary to Excess-1 Converter (BEC) instead of RCA with Cin=1 in the regular CSLA to achieve lower area and power consumption. The main advantage of this BEC logic comes from the lesser number of logic gates than the n-bit Full Adder (FA) structure. The details of the BEC logic are discussed.The delay and area evaluation methodology of the basic adder blocks. The CSLA has been chosen for comparison with the proposed design as it has a more balanced delay, and requires lower power and area .The delay and area evaluation methodology of the regular and modified CSLA are presented respectively. The ASIC implementation details and results are analyzed. The carry-select adder partitions the adder into several groups, each of which performs two additions in parallel. Therefore, two copies of ripple-carry adder act as carry evaluation block per select stage. One copy evaluates the carry chain assuming the block carry-in is zero, while the other assumes it to be one. Once the carry signals are finally computed, the correct sum and carry-out signals will be simply selected by a set of multiplexers. FA and HA are abbreviations for full adder and half adder, respectively, and HA is a full adder with a constant carry-in of logic 1. The main drawback of the conventional CSL is the doubling of the area cost to duplicate another adder. As all sums equal one, the first zero detection circuit generates one at the final node. For all the other cases, it generates a zero carry-out. As oppose to using dual RCAs in conventional CSL, the architecture of contemporary CSL adder comprises a single RCA, a first zero detection and selective complement add-one circuit and a carry-select multiplexer circuit. The AND, OR, and Inverter (AOI) implementation of an XOR gate is designing in the carry select adder circuit. The gates between the dotted lines are performing the operations in parallel and the numeric representation of each gate indicates the delay contributed by that gate. The delay and area evaluation methodology considers all gates to be made up of AND, OR, and Inverter, each having delay equal to 1 unit and area equal to1unit. Then add up the number of gates in the longest path of a logic block that contributes to the maximum delay. The area evaluation is done by counting the total

number of AOI gates required for each logic block. Based on this approach, the CSLA adder blocks of 2:1 mux, Half Adder (HA), and FA are evaluated. A carry-select adder is divided into sectors, each of which except for the least significant performs two additions in parallel, one assuming a carry-in of zero, the other a carry-in of one. The 16-bit carry-select adder of is divided into sectors. The 4-bit sector of illustrates the general principle, within the sector, there are two 4-bit ripple- carry adders receiving the same data inputs but different carry-ins. The upper adder has a carry-in of zero; the lower adder a carry-in of one. The actual carry in from the preceding sector selects one of the two adders. If the carry-in is zero, the sum and carry-out of the upper adder are selected. If the carry-in is one, the sum and carry-out of the lower adder are selected. Carry-select method has deemed to be a good compromise between cost and performance in carry propagation adder design. However, conventional carry-select adder (CSLA) is still area-consuming due to the dual ripple carry adder structure. The excessive area overhead makes CSLA relatively unattractive but this has been circumvented by the use of add-one circuit introduced recently. In this paper, an area efficient square root CSLA scheme based on a new first zero detection logic is proposed. The proposed CSLA witnesses a notable power-delay and area-delay performance improvement by virtue of proper exploitation of logic structure and circuit technique. The carry-select adder partitions the adder into several groups, each of which performs two additions in parallel. Therefore, two copies of ripple-carry adder act as carry evaluation block per select stage. One copy evaluates the carry chain assuming the block carry-in is zero, while the other assumes it to be one. Once the carry signals are finally computed, the correct sum and carry-out signals will be simply selected by a set of multiplexers. A typical block to determine the optimal variable block sizes, the latencies of primitive gates used in the conventional 64-bit CSLA have been simulated for the same driving strength and standard output loading. The results are listed in HA and FA are built with transmission gates to speed up the worst-case delay. The delay time of MUX refers to the delay of the multiplexer from the select signal to the output signal and MUX (thru) refers to the delay from the input signals to be selected to the output signal. FA (sum), HA (sum) refer to the delays from the input to the sum output The delays from the input to the carry output are similarly annotated with ―(Cout)‖. According to these basic gate latencies, it is evident that there will be mismatch

of arrival time between the carry-select signal and the sum signals to the MUX in a square root CSLA. The equalization of the delays through both paths can be achieved by progressively adding more bits to the subsequent stages of adder groups, so that more time is required for the generation of carry signals. Thus, the block sizes of our 64-bit CSLA can be determined as indicated in Starting from two-bit RCA per group for the first two groups, the bits beyond the fifth bit are grouped in such a way that the number of bits in the group increases by one progressively. In this way the discrepancy in arrival time at the MUX nodes will be minimized. As the block delay of the conventional square root CSLA is very similar to, the configuration of CSLA block sizes has been adopted in this proposed design.

1.3 PROBLEM STATEMENT The problem in the project is to get only the unsigned bit values, the signed bit values are not obtained. By using the full adder AND gate to produce only the unsigned bit values. The unsigned bit has only positive value. Due to the absence of signed bit, the multiplier not considered in the fast Fourier transform because in the fast Fourier transform need both the signed and unsigned bit value for calculating the twiddle factor in the DSP applications. It has higher delay and power delay product (PDP) due to more number of add/shift stage. By using the various adders it consume more area, more power and long computational times. These type of multiplier not used for widely many computations.

CHAPTER 2 SYSTEM ANALYSIS 2.1ANALYSIS OF CLASSIFIED BINARY ADDER ARCHITECTURES Sertbas and Selami worked on the performance analysis classified binary adder architectures[4]. They compared the ripple carry adder; carry look ahead adder, carry select adder and conditional sum adder. Their work included the unit-gate models for area and delay

2.1.1. RIPPLE CARRY ADDERS (RCA) The well known adder architecture, ripple carry adder is composed of cascaded full adders for n-bit adder, as shown in figure.1.It is constructed by cascading full adder blocks in series. The carry out of one stage is fed directly to the carry-in of the next stage. For an n-bit parallel adder it requires n full adders.

FIGURE. 1 A 4-bit Ripple Carry Adder 

Not very efficient when large number bit numbers are used.



Delay increases linearly with bit length.

DELAY Delay from Carry-in to Carry-out is more important than from A to carry-out or carry-in to SUM, because the carry-propagation chain will determine the latency of the whole circuit for a Ripple-Carry adder. Considering the above worst-case signal propagation path thus write the following equation. For a k-bit RCA worst case path delay is TRCA-k bit = TFA(x0, y0 c0) + (k-2)* TFA(Cin Ci) + TFA(Cin Sk-1)

LOGIC EQUATIONS gi = ai bi p = ai xor bi. Ci+1 = gi + pici Si = pi xor ci. COMPLEXITY AND DELAY FOR n-BIT RCA STRUCTURE ARCA = O (n) = 7n TRCA = O (n) = 2n

2.1.2 CONDITIONAL SUM ADDER (COSA) Another fast addition scheme is the conditional sum adder, based on the generating two sets of output for a given group of operand bits. Each set includes the sum bits and an outgoing carry. One of sets accepts the incoming carry as zero (0), the other as one (1). Once the incoming carry is known, only the correct set of outputs is selected without waiting for the carry. In this method, the n-bits operands are divided into smaller groups and in this way, the serial carry-propagation inside the separate groups can be done in parallel, increasing the speed of the adder. In principle, the division process into the groups can be continued until a group of size 1. In this case, the all addition process is done in log2 n steps, the level number of the used multiplexers. In Fig. 2, the application of conditional sum method is shown for the addition of two 8-bit binary numbers.

Figure.2 The two sets of the outputs (DHA) THE LOGIC EQUATIONS 0

C

i+1 =

ai bi

0

S i = ai ⊕bi 1

C

1

i+1 =

S i= Ś

ai + bi

0 i

COMPLEXITY AND DELAY FUNCTIONS FOR N-BIT COSA ACOSA = O(logn) = 3nlog2n TCOSA = O(logn) = 2 log2n

2.1.3ANALYSIS OF ADDERS In we compared 3- different adders Ripple Carry Adders, Carry Select Adders and the Carry Look Ahead Adders. The basic purpose of this experiment was to know the time and power trade-offs between different adders whish will give us a clear picture of which adder suits best in which type of situation during design process. Hence below to present both the theoretical and practical comparisons of all the three adders whish were taken into consideration.

ADDER NAME

COMPLEX(AX)

AREA FOR n-BIT

Ripple Carry Adder(RCA)

O(n)

7n

Carry Select Adder(CSA)

O(n)

14n

Carry Look Ahead Adder(CLA)

O(n)

4n

Table.1 Theoretical comparison of area occupied

ADDER NAME

COMPLEX(T)

AREA FOR n-BIT

Ripple Carry Adder(RCA)

O(n)

2n

Carry Select Adder(CSA)

O(n1/*1+1)

2.8 (n1/2)

Carry Look Ahead Adder(CLA)

O(log2n)

4 log2n

Table.2 Theoretical comparison of time required

ADDER NAME Ripple Carry Adder(RCA) Carry Select Adder(CSA) Carry Look Ahead Adder(CLA)

DELAY FOR n BIT 2n

AREA FOR N BIT AREA DELAY PRODUCT 7n 14n2

2.8 (n1/2)

14n

39.6 (n)3/2

4 log2n

4n

16n log2n

Table.3 Theoretical comparison of area delay product It stated the theoretical comparison of AREA required and both the theoretical and simulated value of TIME required. The values stated above are the values for n-bit adders. So analyzing the above facts to reached at the following conclusions about different adders and intelligent use of them in different circumstances according to the SPACE TIME trade-off. The results can be summarized as follows. 

Regarding the circuit area complexity in the adder architectures, the ripplecarry adder (RCA) in the first class is the most efficient one, but the carry select adder (CSLA) in the fourth class with highest complexity is the least efficient one.



Considering the circuit delay time, Carry Select Adder (CSLA) is the fastest one for every n-bit length, so has the shortest delay. Otherwise, Ripple Carry Adder (RCA) is the slowest one, due to the long carry propagation.



It defined a term Area-Delay Product which gave the clear picture of the spacetime tradeoff. It is worthy to note that while consider all the adders discussed above Ripple Carry adders and Carry Select Adders are the two sides of the spectrum.



As, while Ripple Carry Adders have a smaller area and lesser speed, in contrast to which Carry Select adders have high speed (nearly twice the speed f Ripple Carry Adders) and occupy a larger area. But Carry Look Ahead Adder (CLA) has a proper balance between both the Area occupied and Time required. Hence among the three, Carry Look Ahead Adder has the least AREA DELAY PRODUCT. Hence it should use Carry Look ahead Adders when it comes to optimization with both Area and Time.

2.2 HIGH SPEED MULTIPLIER Asadi and Navi developed a new 54 x 54 bit multiplier using a high-speed carry-lookahead adder [5]. They use a delay insensitive carry look ahead adder. Their proposed multiplier reduced the number of transistors, delay and power consumption.

2.2.1 BOOTH MULTIPLIER Though Wallace Tree multipliers are faster than the traditional Carry Save Method, it also was very irregular and hence was complicated while drawing the Layouts. Slowly when multiplier bits gets beyond 32-bits large numbers of logic gates are required and hence also more interconnecting wires which makes chip design large and slows down operating speed Booth multiplier can be used in different modes such as radix-2, radix-4, radix-8 etc. Here to use Radix-4 Booth’s Algorithm because of number of Partial products is reduced to n/2.

2.2.2 BOOTH MULTIPLICATION ALGORITHM One of the solutions realizing high speed multipliers is to enhance parallelism which helps in decreasing the number of subsequent calculation stages. The Original version of Booth’s multiplier (Radix – 2) had two drawbacks. 

The number of add / subtract operations became variable and hence became inconvenient while designing Parallel multipliers.



The Algorithm becomes inefficient when there are isolated 1s . These problems are overcome by using Radix 4 Booth’s Algorithm which can

scan strings of three bits with the algorithm given below. The design of Booth’s multiplier in this project consists of four Modified Booth Encoded (MBE), four sign extension corrector, four partial product generators (comprises of 5:1 multiplexer) and finally a Wallace Tree Adder. This Booth multiplier technique is to increase speed by reducing the number of partial products by half. Since an 8-bit booth multiplier is used in this project, so there are only four partial products that need to be added instead of eight partial products generated using conventional multiplier. The architecture design for the

modified Booths Algorithm is shown below.

Fig 3.The modified Booths Algorithm 

Modified Booth encoding is most often used to avoid variable size partial product arrays.



Partial product generator is the combination circuit of the product generator and the 5 to 1 MUX circuit. Product generator is designed to produce the product by multiplying the multiplicand A by 0, 1, -1, 2 or -2.



Sign Extension Corrector is designed to enhance the ability of the booth multiplier to multiply not only the unsigned number but as well as the signed number.



Wallace tree has been used in this project in order to accelerate multiplication by compressing the number of partial products.

2.3 MULTIPLIER ARCHITECTURE FOR LOW POWER Meier and Carley worked on the exploring multiplier architecture and layout for low power[3]. They compare the array multiplier and Wallace tree multiplier to find the style unsigned multipliers over bit widths 8 to 24 bits. It include the first order delay and area effects due to physical wiring. Their partial product reduction hardware seems to offset the power lost in the wiring, offering improved energy and delay.

Fig.4 architecture of array and Wallace tree

2.3.1THE WALLACE TREE MULTIPLIER The Wallace tree multiplier is considerably faster than a simple array multiplier because its height is logarithmic in word size, not linear. However, in addition to the large number of adders required, the Wallace tree’s wiring is much less regular and more complicated. As a result, Wallace trees are often avoided by designers, while design complexity is a concern to them. Wallace tree styles use a log-depth tree network for reduction. Faster, but irregular, they trade ease of layout for speed. Wallace tree styles are generally avoided for low power applications, since excess of wiring is likely to consume extra power. While subsequently faster than Carry-save structure for large bit multipliers, the Wallace tree multiplier has the disadvantage of being very irregular, which complicates the task of coming with an efficient layout. The Wallace tree multiplier is a high speed multiplier. The summing of the partial product bits in parallel using a tree of carry-save

adders became generally known as the ―Wallace Tree‖. Three step processes are used to multiply two numbers. 

Formation of bit products.



Reduction of the bit product matrix into a two row matrix by means of a carry save adder.



Summation of remaining two rows using a faster Carry Look Ahead Adder (CLA).

2.3.2 ARRAY MULTIPLIER Array styles use a regular 2-D grid of adders for this reduction. Compact and easy to lay out, the arrays perform this reduction in gate depth that is linear in the bit width. At the other end of the spectrum, Wallace tree styles use a log-depth tree network for this reduction. Faster, but irregular, they trade ease of layout for speed. Although the speedsize tradeoffs for these two styles are fairly

Fig.5 Array multiplier

The two basic operations 

The generation and summation of partial products



It can be merged, avoiding overhead and speeding up multiplication.

Iterative array multipliers (or array multipliers) consist of identical cells, each forming a new partial product and adding it to previously accumulated partial product 

Gain in speed obtained at expense of extra hardware



Can be implemented so as to support a high rate of pipelining

2.4 UNSIGNED BIT MULTIPLIER Seshadri and Ramakrishna worked on the design and implementation of 32 bit unsigned multiplier using CLAA and CSLA[1]. It deals with the comparison of the VLSI design of the carry look-ahead adder (CLAA) based 32-bit unsigned integer multiplier and the VLSI design of the carry select adder (CSLA) based 32-bit unsigned integer multiplier. Both the VLSI design of multiplier multiplies two 32-bitunsigned integer values and gives a product term of 64-bit values.

2.4.1 MULTIPLIER FOR UNSIGNED DATA Multiplication involves the generation of partial products, one for each digit in the multiplier, as in Figure3. These partial products are then summed to produce the final product. The multiplication of two n-bit binary integers results in a product of up to 2n bits in length.

Figure.6 multiplier for unsigned data These partial products are then summed to produce the final product. The multiplication of two n-bit binary integers’ results in a product of up to 2n bits in length. Here used the above algorithm to implement the multiplication operation for unsigned data.

2.4.2 MULTIPLICATION ALGORITHM Let the product register size be 64 bits. Let the multiplicand registers size be n bits. Store the multiplier in the least significant half of the product register. Clear the most significant half of the product register. Repeat the following steps for n times: 1. If the least significant bit of the product register is ―1‖ then add the multiplicand to the most significant half of the product register. 2. Shift the content of the product register one bit to the right

3. Shift-in the carry bit into the most significant bit of the product register. Fig. 2 shows a block diagram for such a multiplier [2].

Figure.7 Multiplier of two n-bit values.

2.2 PROPOSED SYSTEM The problem in the existing system is to get only the unsigned bit values, the signed bit values are not obtained. By using the full adder AND gate to produce only the unsigned bit values. The unsigned bit has only positive value. Due to the absence of signed bit, the multiplier not considered in the fast Fourier transform because in the fast Fourier transform need both the signed and unsigned bit value for calculating the twiddle factor in the DSP applications. It has higher delay and power delay product (PDP) due to more number of add/shift stage. By using the various adders it consume more area, more power and long computational times. These type of multiplier not used for widely many computations. The proposed solution to the problem is using the reconfigurable cells replacement of full adder AND gate in the baugh wooley multiplier. By using reconfigurable cell the multiplier produces both the signed and unsigned bit. Various multiplier techniques were

introduced, among them modified booth algorithm is used widely for many computational applications. By using the adder based multipliers, it works effectively. However it has higher delay and power delay product (PDP) due to more number of add/shift stages. To eliminate this propagation delays, a fast effective multiplier is to be implemented using the Baugh-Wooley algorithm, which utilizes few logical operations in each step of the multiplication process and therefore it does not endure large propagation gate delays. By using the CLAA adders the area will be reduced and it produces the carries faster due to parallel generation of the carry bits by using additional circuitry. The CSLA have small area requirements and shortened computation times.

CHAPTER 3 SYSTEM SPECIFICATION SOFTWARE REQUIREMENTS  

Modelsim 6.6 Quartus 10.1 IDE

HARDWARE REQUIREMENTS ALTERA DE2 EP2C35F672C6N BOARD                     

Altera Cyclone® II 2C35 FPGA device Altera Serial Configuration device - EPCS16 USB Blaster (on board) for programming and user API control; both JTAG and Active Serial (AS) programming modes are supported 512-Kbyte SRAM 8-Mbyte SDRAM 4-Mbyte Flash memory (1 Mbyte on some boards) SD Card socket 4 pushbutton switches 18 toggle switches 18 red user LEDs 9 green user LEDs 50-MHz oscillator and 27-MHz oscillator for clock sources 24-bit CD-quality audio CODEC with line-in, line-out, and microphone-in jacks VGA DAC (10-bit high-speed triple DACs) with VGA-out connector TV Decoder (NTSC/PAL) and TV-in connector 10/100 Ethernet Controller with a connector USB Host/Slave Controller with USB type A and type B connectors RS-232 transceiver and 9-pin connector PS/2 mouse/keyboard connector IrDA transceiver Two 40-pin Expansion Headers with diode protection

CHAPTER 4 SOFTWARE SPECIFICATION 4.1 MODELSIM ModelSim is a powerful simulator that can be used to simulate the behavior and performance of logic circuits. Modelsim is a simulation and debugging tool for VHDL, Verilog, system C, and mixed language designs. Modelsim provides a comprehensive simulation and debug environment for complex ASIC and FPGA designs. Support is provided for multiple languages including verilog, system verilog, VHDL, system C. Mentor Graphics was the first to combine single kernel simulator (SKS) technology with a unified debug environment for verilog, system verilog, VHDL, and system C. The combination of industry-leading performance and capacity with the best integrated debug and analysis environment make Modelsim the simulator of choice for both ASIC and FPGA design. The best standards and platform support in the industry make it easy to adopt in the majority of process and tool flows. Modelsim provides seamless, scalable performance and capabilities. Through the use of a single compiler and library system for all Modelsim configurations, employing the right Modelsim configuration for project needs. Modelsim PE and LE enable individual engineers to develop and debug small to medium size design blocks on windows and Linux Modelsim SE combines high performance and high capacity with the code coverage and debugging capabilities required to simulate larger blocks and systems and attain ASIC gate level sign-off. Modelsim SE offers the ability to simulate very large designs through support of 32 and 64 bit UNIX and LINUX and 32 bit Windows-based platforms. The Modelsim SE vopt usage mode achieves industry-leading performance and capacity through very aggressive, global compile and simulation optimization algorithms of verilog and VHDL. The vopt performance mode can improve verilog and mixed VHDL/Verilog RTL simulation performance by upto 10X.The vopt mode can also improve gate-level performance by up to 4X and capacity by over 2X. 27

4.2 QUARTUS II SIMULATOR QUARTUS II Altera simulator software is used in this paper for power analysis. Altera supports power estimation and analysis from design concept through implementation, with the most accurate and complete power management design tools. Quartus II software version 10.1 provided by Altera. Quartus II also helps to analyze the power by using power play analyzer. The basic features Quartus II software is as follows. Steps to use Quartus II and to know how the software can be used to design and implement a circuit specified using the verilog hardware description language. 

Creating a project



Using the Quartus II Integrated Synthesis tool, synthesize a circuit from verilog code.



Fitting a Synthesized circuit into Altera FPGA.



Examine the report on the results of fitting and timing analysis.



Examine the synthesized circuit in the form of schematic diagram generated by the RTL viewer.

4.3 VERILOG Verilog, standardized as IEEE 1364, is a hardware description language (HDL) used to model electronic systems. It is most commonly used in the design and verification of digital circuits at the register-transfer level of abstraction. It is also used in the verification of analog circuits and mixed-signal circuits. Hardware description languages such as Verilog differ from software programming languages because they include ways of describing the propagation of time and signal dependencies. Verilog represented a tremendous productivity improvement for circuit designers who were already using graphical schematic capture software and specially written software programs to document and simulate electronic circuits. The designers of Verilog wanted a language with syntax similar to the C programming language, which was already widely used in engineering software development. Like C, Verilog is casesensitive and has a basic preprocessor. A Verilog design consists of a hierarchy of modules. Modules encapsulate design hierarchy, and communicate with other modules through a set of declared input, output, and bidirectional ports.

4.4 ALTERA FPGA Performing a hardware multiply is necessary in any system that contains Digital Signal Processing (DSP) functionality such as filtering, modulation, or video processing. Often there is an off-the-shelf component that the engineer can use to solve his problem. However, in some designs, the expense of a dedicated DSP chip is not justified and so the use of a Field Programmable Gate Array (FPGA) is more viable. Another reason for using a programmable device to perform hardware multiplication is if the system parameters are non-standard and would require expensive circuitry outside the DSP, or running over-powered DSP chips for a large and small operand combinations (a 12- by 4-bit multiplier, for example). Using an FPGA in cases like these gives complete flexibility, high performance, and lower cost over an off-theshelf solution.This application note describes various techniques available to the designer for performing high-speed signed multiply operations in ALTERA FPGAs. A brief discussion on the basics of digital multiplication will be introduced, followed by discussions on how to implement the signed multiplier to meet your design requirements. Example implementations will be shown, from the slower, purely combinatorial version to the fully pipe-lined version that runs at 200MHz. The example designs shown in this application note are done in schematic as well as in Verilog and VHDL. These are generic designs that can be easily modified to suit various design needs. These are not ―hard macros‖. Therefore they can be modified by users.

All timing information

presented in this application note is derived from worst case simulations performed within Quartus integrated Windows-based design software

CHAPTER 5 PROJECT DESCRIPTION

5.1OVERVIEW OF PROJECT A multiplier is one of the key in hardware blocks in the majority digital and high performance systems such as FIR filters, microprocessors and digital signal processors etc. Multipliers hold large area, long latency and consume considerable power. Consequently low-power multiplier plan has been an important part in low- power VLSI system design. It has been widespread work on low-power multipliers technology in physical, circuit and logic levels. The system performance is normally determined by the performance of the multiplier for the reason that the multiplier is generally the slowest element in the system. Therefore to use the Baugh wooly multiplier for both the signed and unsigned bit. Addition is the most common and frequently used arithmetic operation on microprocessor, digital signal processor, particularly digital computers. as well it serves at the same time as a building block for synthesis all further arithmetic operations. Therefore, concerning the proficient implementation of an arithmetic unit, the binary adder structure becomes a very significant hardware unit. At this point to dealing with the comparison in the bit range of 64*64 as input and 128 bits as output. The command is of DSP method for equally less delay time and less area requirement for designing the systems. In this project to compare the performance of two different adders implemented to the multipliers based on area and time needed for computation. The significance is in the basic building blocks of arithmetic circuit with the intention of takeover in VLSI architectures, DSP applications, computer applications and everywhere condensed area computation is required.

5.2 BLOCK DIAGRAM In the block diagram uses the Baugh wooly multiplier which consists of full adder using AND gate and reconfigurable cell.

Fig 8 block diagram of signed and unsigned multiplier

To use common 64* 64 array for both signed and unsigned multiplication with 3970 FAA cells and 126 RC cells

Fig 9 reconfigurable cell

The reconfigurable cell consists of AND gate and NAND gate and its output is selected by a MUX based on signed and unsigned multiplication. If AND gate is selected, it will act as FAA to produce unsigned bit. If NAND gate is selected, it will act as FAN to produce signed bit. the final stage output are computed individually with carry select adder and carry look ahead adder to analyze an efficient multiplier for real time application.

5.3BAUGH WOOLEY MULTIPLIER The Baugh-Wooly algorithm for the unsigned binary multiplication is based on the thought shown in fig 3. The algorithm specify that all possible AND terms are produced first, and then send through an array of half-adders and full-adders with the carry-outs chained to the next most significant bit at every level of addition. For signed multiplication (by utilize the properties of the two’s complement system) the BaughWooly algorithm can execute signed multiplication in almost the same way as the unsigned multiplication shown above. The hardware implementation is actually much related to the unsigned implementation. The variation is that except for the AND term relating to the MSB of both the multiplicand

and multiplier, all other AND terms

involving the MSBs of the multiplicand or the multiplier are reversed before being provide for into the adder array

5.4 FULL ADDER A full adder adds binary numbers and accounts for values carried in as well as out. A one-bit full adder adds three one-bit numbers, often written as A, B, and Cin; A and B are the operands, and Cin is a bit carried in from the next less significant stage. The full-adder is usually a component in a cascade of adders, which add 8, 16, 32, etc. bit wide binary numbers. The circuit produces a two-bit output, output carry and sum typically represented by the signals Cout and S, where

.

Fig.10 full adder A full adder can be implemented in many different ways such as with a custom transistor-level circuit or composed of other gates. One example implementation is with

and

.

In this implementation, the final OR gate before the carry-out output may be replaced by an XOR gate without altering the resulting logic. Using only two types of gates is convenient if the circuit is being implemented using simple IC chips which contain only one gate type per chip. In this light, Cout can be implemented as

. A full adder can be constructed from two half adders by connecting A and B to the

input of one half adder, connecting the sum from that to an input to the second adder, connecting Ci to the other input and OR the two carry outputs. Equivalently, S could be made the three-bit XOR of A, B, and Ci, and Cout could be made the three-bit majority function of A, B, and Cin

5.5 CARRY SELECT LEVEL ADDER The concept of CSLA is to compute alternative results in parallel and subsequently selecting the correct result with single or multiple stage hierarchical techniques. In CSLA both sum and carry bits are calculated for two alternatives Cin=O and 1. Once Cin is delivered, the correct computation is chosen using a mux to produce the desired output. Instead of waiting for Cin to calculate the sum, the sum is correctly output as soon as Cin gets there. The time taken to compute the sum is then avoided which results in good improvement in speed. In this scheme, blocks of bits are added in two ways: one assuming a carry-in of 0 and the other with a carry-in of 1.

0

This process results two precomputed sum and carry-out signal pairs {s 1

s

0 i-1:k

, c i;

1

i-1:k

, c i}, later as the block’s true carry-in (ck) becomes known, the corrected signal

pairs are selected. Figure 2 depicts the carry-select adder stucture for n-bit added binary numbers. In the following, the logic expressions and complexity of the carry-select adder are given.

Figure 11 carry select-adder(CSLA)

LOGIC EQUATIONS: 0

si-1:k = kc s

1 i:-1:k + sk

0

i-1:k

1

ci = kcc i + ck i c COMPLEXITY AND DELAY FOR N-BIT CSLA : ACSLA = O(n) =14n , 1/*l+1

TCSLA = O(n

1/2

) = 2.8 n

5.6CARRY LOOK AHEAD ADDER Carry Look Ahead Adder can produce carries faster due to carry bits generated in parallel by an additional circuitry whenever inputs change. This technique uses carry bypass logic to speed up the carry propagation.Let ai and bi be the augends and addend inputs, ci the carry input, si and ci+1 , the sum and carry-out to the ith bit position. If the auxiliary functions, pi and gi called the propagate and generate signals, the sum output respectively

are

Figure 12 carry look ahead adder

LOGIC EQUATIONS pi = ai + bi

define

follows.

gi = ai bi si = ai xor bi xor ci ci+1 = gi + pici . 

To increase the no of bits in the Carry Look Ahead adders, the complexity increases because the no. of gates in the expression Ci+1 increases. So practically its not desirable to use the traditional CLA shown above because it increase the Space required and the power too.



Here to use Carry Look Ahead adder (less bits) in levels to create a larger CLA. Commonly smaller CLA may be taken as a 4-bit CLA. It defined carry look ahead over a group of 4 bits. Now to redefine terms carry generate as [Group Generated Carry] g[ i,i+3 ] and carry propagate as [Group Propagated Carry] p[ i,i+3 ] which are defined below

COMPLEXITY AND DELAY FOR n-BIT CLA STRUCTURE ACLA = O (n) = 14n TCLA = O (log n) = 4 log2n.

5.7 SIGNED &UNSIGNED BIT The sign bit is a bit in a signed number illustration that indicates the sign of a number. Simply signed numeric data types have a sign bit, and its position was regularly the leftmost, where the most significant bit in unsigned numbers reside. The most significant bit of a binary number it is used to identify whether the number is positive or negative. The signed bit must be situated one place value away from the standard set of digits. The unsigned bit which has only positive and zero values. Floating point numbers in IEEE format are for all time signed, with the sign bit in the leftmost position. Typically if the sign bit is 1 then the number is negative (in the case of two's complement integers) or non-positive (for ones' complement integers, sign-andmagnitude integers, and floating point numbers), while 0 indicates a non-negative number.

In the two's complement representation, the sign bit has the weight −2w−1 where w is the number of bits. In the ones' complement representation, the most negative value is 1 − 2w−1, but there are two representations of zero, one for each value of the sign bit. In a sign-and-magnitude representation of numbers, the value of the sign bit determines whether the numerical value is positive or negative. When an 8-bit value is added to a 16-bit value using signed arithmetic, the processor unit propagates the sign bit through the high order half of the 16-bit register holding the 8-bit value – a process called sign extension or sign propagation. The process of sign extension is used whenever a smaller signed data type needs to be converted into a larger signed data type while still retaining its original numerical value. For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; there shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M = N). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways: — the corresponding value with sign bit 0 is negated (sign and magnitude); — the sign bit has the value -(2N) (two’s complement); — the sign bit has the value -(2N - 1) (ones’ complement). Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones’ complement), is a trap representation or a normal value. In the case of sign and magnitude and ones’ complement, if this representation is a normal value it is called a negative zero. It was noted previously that we will not be using a minus sign (-) to represent negative numbers. We would like to represent our binary numbers with only two symbols, 0 and 1. There are a few ways to represent negative binary numbers. The simplest of these methods is called ones complement, where the sign of a binary number is changed by simply toggling each bit (0's become 1's and vice-versa).

This has some difficulties, among them the fact that zero can be represented in two different ways (for an eight bit number these would be 00000000 and 11111111)., we will use a method called two's complement notation which avoids the pitfalls of one's complement, but which is a bit more complicated.

Table 4 signed and unsigned bits

5.8 IMPLEMENTATION OF ALTERA DE2 EP2C35F672C6N BOARD The DE2 board has software support for standard I/O interfaces and a control panel facility for accessing various components. Also, software is provided for a number of demonstrations that illustrate the advanced capabilities of the DE2 board. In order to use the DE2 board, the user has to be familiar with the Quart us-II software. The necessary knowledge can be acquired by reading the tutorials getting started with Altera’s DE2 board and Quart us-II introduction (which exists in three versions based on the design entry method used, namely Verilog, VHDL or schematic entry). To provide maximum flexibility for the users, all connections are made through the Cyclone II FPGA device. Thus, the user can configure the FPGA to implement any system design

CHAPTER 6 RESULTS AND DISCUSSION 6.1SIMULATION RESULT FOR UNSIGNED BIT MULTIPLIER

Fig.13 unsigned multiplier In the unsigned bit multiplier, It has two inputs 11 and 6, to produce the output 66.the inputs are unsigned value and the output also unsigned value. The unsigned multiplier means it produce only positive values.

6.2 SIMULATION RESULT FOR SIGNED BIT MULTIPLIER

Fig.14 signed multiplier In the signed bit multiplier, It has two inputs -1 and -1, to produce the output 1.the inputs are signed value and the output are unsigned value. The signed multiplier means it produce both positive and negative values. In the signed and unsigned bit multiplier, it produced both the signed and unsigned bit as output value.

6.3 SIMULATION RESULT FOR FFT INPUT

Fig.15 FFT input

6.3.1 SIMULATION RESULT FOR FFT 1ST STAGE

Fig.16 FFT 1st stage

6.3.2 SIMULATION RESULT FOR FFT 2nd STAGE

Fig.17 FFT 2nd STAGE

6.3.3 SIMULATION RESULT FOR FFT 3rd STAGE

Fig.18 FFT 3rd STAGE

6.3.4 SIMULATION RESULT FOR FFT 4th STAGE

Fig.19 FFT 4th STAGE There are 4 stages to be used for 128 FFT. Each s1,s2,s3,s4 consists of c,d,e,f stages. Input 128 =output128 

1st stage=C_32+d_32+e_32+f_32



2nd stage=C_32+d_32+e_32+f_32



3rd stage=C_32+d_32+e_32+f_32



4th stage=C_32+d_32+e_32+f_32

The output of both CSLA and CLAA are same but the area, power, delay will be varied acc to the selection of adder. It will be shown in analysis reports.

6.3.5 SIMULATION RESULT FOR FFT OUTPUT FOR CSLA

Fig.20 output for CSLA

6.3.6 SIMULATION RESULT FOR FFT OUTPUT FOR CLAA

Fig.21 output for CLAA

6.4 HARDWARE IMPLMENTATION 

In the Quartus window, choose the new project wizard and load the proposed design coding, click on start compilation icon.



Once the compilation is completed, click on the programmer icon and the window opens.



Click on hardware setup button and select the corresponding hardware type.



Creation of a .sof file is done.



Select the .sof file and click on start button. Once the process is completed, the program file is loaded in the kit.

The Altera kit consists of eighteen pins from sw0 to sw17. Pin sw0 is taken as reset and pin sw1to sw3 is taken as input. The outputs are displayed in hexadecimal format in seven segment display.

Fig.22 Hardware Implementation

6.5ANALYSIS AND SYNTHESIS RESULTS The verilog coding to be used to simulate the CSLA and CLAA adder based multipliers. The area, power and delay can be analysed by using Altera Quartus II

Area analysis for CSLA based multiplier

Power analysis for CSLA based multiplier

Delay time analysis for CSLA based multiplier

Area analysis for CLAA based multiplier

Power analysis for CLAA based multiplier

Delay analysis for CLAA based multiplier

COMPARISION RESULTS: PARAMETERS

CLAA

CSLA

AREA (Total logic elements) POWER

17,198

13,930

115.94 mW

115.53mW

FMAX Table 5 comparison result

37.54MHZ

43.37MHZ

CHAPTER 7 CONCLUSION In this paper the implementation of 64 bit signed and unsigned multiplier with CLAA and CSLA was presented. By using baugh wooley multiplier, the full adder AND gate and reconfigurable cells were designed, so it act as both signed and unsigned multiplier. These type of multiplier are used in DSP applications like FFT, because it performs both signed and unsigned multiplication. By using the Carry select level adder and Carry Look Ahead Adder the area, power, delay can be reduced.

Comparing

these two adder based multiplier, carry select level adder based multiplier acquires better area, power and delay reduction. Verilog language was used in the modelsim to simulate our multiplier and altera quartus II used for the implementation.

CHAPTER 8 APPENDIX 8.1SOURCE CODE module BW( a,b,p); input [3:0] a,b; output [63:0] p; wire c0c; wire c1s; wire c1c; wire c2s; wire c2c; wire c3s; wire c3c;

// wire c4s; wire c4c; wire c5s; wire c5c; wire c6s; wire c6c; wire c7s; wire c7c; // wire c8s; wire c8c; wire c9s; wire c9c; wire c10s; wire c10c; wire c11s; wire c11c; // wire c12s; wire c12c;

faa_cell faa_cell faa_cell fan_cell

fa1( fa2( fa3( fa4(

a[0],b[0],1'b0, a[1],b[0],1'b0, a[2],b[0],1'b0, a[3],b[0],1'b0,

1'b0, 1'b0, 1'b0, 1'b0,

p[0], c1s, c2s, c3s,

c0c); c1c); c2c); c3c);

faa_cell faa_cell faa_cell fan_cell

fa5( fa6( fa7( fa8(

a[0],b[1],c0c, a[1],b[1],c1c, a[2],b[1],c2c, a[3],b[1],c3c,

c1s, c2s, c3s, 1'b0,

p[1], c5s, c6s, c7s,

faa_cell faa_cell faa_cell fan_cell

fa9 ( fa10( fa11( fa12(

a[0],b[2],c4c, a[1],b[2],c5c, a[2],b[2],c6c, a[3],b[2],c7c,

c5s, c6s, c7s, 1'b0,

p[2], c9s, c10s, c11s,

c8c ); c9c ); c10c); c11c);

fan_cell fan_cell fan_cell faa_cell

fa13( fa14( fa15( fa16(

a[0],b[3],c8c, a[1],b[3],c9c, a[2],b[3],c10c, a[3],b[3],c11c,

c9s, c10s, c11s, 1'b0,

p[3], c13s, c14s, c15s,

c12c ); c13c ); c14c); c15c);

wire [3:0] rca_a; assign rca_a = {c15c, c14c,c13c,c12c}; wire [3:0] rca_b; assign rca_b = {1'b1, c15s,c14s,c13s}; wire [3:0] rca_p; RCA ca( rca_a,rca_b, 1'b1, rca_p, co); assign p[7:4] = rca_p; endmodule RCA module RCA( a,b,ci, so, co); input [31:0] a, b; input ci; output [63:0] so; output co; wire w1; wire w2; wire w3; fa fa fa fa

fa1( fa2( fa3( fa4(

a[0],b[0],ci, a[1],b[1],w1, a[2],b[2],w2, a[3],b[3],w3,

so[0], so[1], so[2], so[3],

w1); w2); w3); co);

endmodule

FULL ADDER AND GATE module faa_cell ( A,B,SI, CI, SO, CO); input A,B,SI, CI ; output SO, CO;

c4c); c5c); c6c); c7c);

wire andg; and g1 (andg,A,B); fa fa1( andg, SI, CI, SO, CO ); Endmodule FULL ADDER NAND module fan_cell ( A,B,SI, CI, SO, CO); input A,B,SI, CI ; output SO, CO; wire nandg; nand g1 (nandg,A,B); fa fa1( nandg, SI, CI, SO, CO ); endmodule

8.2CONFERENCE CERTIFICATE

CHAPTER 9 REFERENCES [1] R.Seshadri and Dr.S.Ramakrishnan, “Design and implementation of 32 bit unsigned multiplier Using CLAA and CSLA‖ 2013 [2] Stefania Perri, Pasquale Corsonello, and Giuseppe Cocorullo,‖ Area-Delay Efficient Binary Adders in QCA‖,2013 [3] Hasan Krad and Aws Yousi "Design and Implementation of a Fast Unsigned 32-bit Multiplier Using VHDL", 2010. [4] P. S. Mohanty, "Design and Implementation of Faster and Low Power Multipliers", Bachelor Thesis. National Institute of Technology, Rourkela, 2009 [5] P .Asadi and K. Navi, ―A novel high speed 54-54 bit multiplier‖Am.J.Applied sci., vol. 4(9),pp.666-672.2007 [6] A. Sertbas and R.S. Ozbey, "A performance analysis of classified binary adder architectures and the VHDL simulations", J Elect. Electron. Eng., Istanbul, Turkey, vol. 4, pp. 1025-1030,2004. [7] P. C. H. Meier, R. A. Rutenbar and L. R. Carley, "Exploring Multiplier Architecture and Layout for low Power", CIC'96, 1996. [8] D.Geethanjali, ―Implementation of 64 bit signed and unsigned multiplier on Altera Quartus-II‖, International conference on emerging trends of technology, March 2014.

implementation of 64 bit signed and unsigned ... -

windows 32 and 64 bit

Adobe Flash Pro CC 13.1.0.226 (64 bit) Multilanguage [ChingLiu] .pdf ...

64 bit nitro pdf converter free download

Geforce 8800 gt drivers windows 7 64 bit

HitFilm Ultimate 2.0.2522.46168 (64 bit) (crack Figgler) [ChingL .pdf ...

Download Adobe Photoshop Lightroom 5.4 Final (64 bit) [ChingLiu ...

Download Maplesoft Maple v18.0 [32-64 Bit] Incl Crack - [MUMBAI ...

Descargar eset nod32 antivirus 6 64 bit

Windows 7 professional 64 bit 2014

Intel(r) sandybridge/ivybridge graphics chipset driver download 64 bit ...

Implementation of a 32-bit RIsC Processor for the Data ...

64 bit adobe flash player for windows 8.pdf

Evaluation of Watermarking Low Bit-rate MPEG-4 Bit ... - CiteSeerX

Ableton Live 9 Suite 9.1.1 (Win 64 bit) (patch IO) [ChingLiu] .pdf ...

Emerging social networks of unsigned country music ...

Evaluation of Watermarking Low Bit-rate MPEG-4 Bit ... - CiteSeerX

Hp pro 3400 mt network drivers windows 7 64 bit

DaVinci Resolve 10.1 (Win 7 64 bit) (crack iND) [ChingLiu] .pdf ...

Cheap Mecool Bb2 Pro Android Tv Box Amlogic S912 64 Bit Octa ...

Download SONY Vegas PRO 12.0.770 (64-bit) Pre-Cracked - Ex s ...