# 128-Bit Advanced FPGA Architecture of Multiplier with Reduced Area Count

Amit Kumar Niwariya<sup>1</sup>, Prof. Sanjeev Shrivastava<sup>2</sup>, Prof. Suresh Gawande<sup>3</sup>

<sup>1</sup>M-Tech Research, <sup>2</sup>HOD, <sup>3</sup>Research Guide Department of Electronics and Communication Engineering Bhabha Research Engineering Institute, Bhopal

Abstract - In a digital logic system a multiplier plays a very important role to perform calculations which highly depends on the adder utilized in the design. Important parameters of the multiplier are area, delay and power. Where area is mostly depends on the adder design, so that the adder should have the less area count to make multiplier working faster. Smaller area count design consumes less power also which is important criteria for the fabrication of chips and high performance systems. Optimizing the area of the multiplier is a major design issue. However, area and speed are usually conflicting constraints so that improving speed results mostly in larger areas. In our project we try to determine the best solution to this problem by comparing a few multipliers. The multiplier architecture proposed in this paper is of 128 bit using binary to excess-1 converter (BEC) based carry select adder (CSLA). The application of BEC instead of Ripple Carry Adder(RCA) is more beneficial in terms of area of the logic, lower delay and the lower power consumption of the circuit.

### Keywords - Multiplier, RCA, BEC, Area, Delay.

### I. INTRODUCTION

Multipliers are key components of many high performance systems such as FIR filters, microprocessors, digital signal processors, etc. A system's performance is generally determined by the performance of the multiplier because the multiplier is generally the slowest clement in the system. Furthermore, it is generally the most area consuming. Hence, optimizing the speed and area of the multiplier is a major design issue. However, area and speed are usually conflicting constraints so that improving speed results mostly in larger areas. As a result, a whole spectrum of multipliers with different area-speed constraints have been designed with fully parallel. Multipliers at one end of the spectrum and fully serial multipliers at the other end. In between are digit serial multipliers where single digits consisting of several bits are operated on. These multipliers have moderate performance in both speed and area. However, existing digit serial multipliers have been Plagued by complicated switching systems and/or irregularities in design. Radix 2<sup>n</sup> multipliers which operate on digits in a parallel fashion instead of bits bring the pipelining to the digit level and

avoid most of the above problems. They were introduced by M. K. Ibrahim in 1993. These structures are iterative and modular. The pipelining done at the digit level brings the benefit of constant operation speed irrespective of the size of the multiplier. The clock speed *is* only determined by the digit size which is already fixed before the design is implemented.

Carry Select Adder:

In electronics, a carry-select adder is a particular way to implement an adder, which is a logic element that computes the (n + 1)-bit sum of two *n*-bit numbers. The carry-select adder is simple but rather fast, having a gate level depth of  $O(\sqrt{n})$ .

The carry-select adder generally consists of two ripple carry adders and a multiplexer. Adding two n-bit numbers with a carry-select adder is done with two adders (therefore two ripple carry adders) in order to perform the calculation twice, one time with the assumption of the carry being zero and the other assuming one. After the two results are calculated, the correct sum, as well as the correct carry, is then selected with the multiplexer once the correct carry is known.

The number of bits in each carry select block can be uniform, or variable. In the uniform case, the optimal delay occurs for a block size of  $\lfloor \sqrt{n} \rfloor$ . When variable, the block size should have a delay, from addition inputs A and B to the carry out, equal to that of the multiplexer chain leading into it, so that the carry out is calculated just in time. The  $O(\sqrt{n})$  delay is derived from uniform sizing, where the ideal number of full-adder elements per block is equal to the square root of the number of bits being added, since that will yield an equal number of MUX delays.

### Basic building block

Above is the basic building block of a carry-select adder, where the block size is 4. Two 4-bit ripple carry adders are multiplexed together, where the resulting carry and sum bits are selected by the carry-in. Since one ripple carry adder assumes a carry-in of 0, and the other assumes a carry-in of 1, selecting which adder had the correct assumption via the actual carry-in yields the desired result.



II. BINARY MULTIPLIER

A Binary multiplier is an electronic hardware device used in digital electronics or a computer or other electronic device to perform rapid multiplication of two numbers in binary representation. It is built using binary adders.

The rules for binary multiplication can be stated as follows

1. If the multiplier digit is a 1, the multiplicand is simply copied down and represents the product.

2. If the multiplier digit is a 0 the product is also 0.

For designing a multiplier circuit we should have circuitry to provide or do the following three things:

1. it should be capable identifying whether a bit is 0 or 1.

2. It should be capable of shifting left partial products.

3. It should be able to add all the partial products to give the products as sum of partial products.

4. It should examine the sign bits. If they are alike, the sign of the product will be a positive, if the sign bits are opposite product will be negative. The sign bit of the product stored with above criteria should be displayed along with the product.

From the above discussion we observe that it is not necessary to wait until all the partial products have been formed before summing them. In fact the addition of partial product can be carried out as soon as the partial product is formed.

Array Multiplier :

Array multiplier is well known due to its regular structure. Multiplier circuit is based on add and shift algorithm. Each partial product is generated by the multiplication of the multiplicand with one multiplier bit. The partial product are shifted according to their bit orders and then added. The addition can be performed with normal carry propagate adder. N-1 adders are required where N is the multiplier length.



An example of 4-bit multiplication method is shown below:



Multiplter For Unsigned Data:

Multiplication involves the generation of partial products, one for each digit in the multiplier, as in Figure3.These partial products are then summed to produce the final product. The multiplication of two n-bit binary integers results in a product of up to 2n bits in length [2]. We used the following algorithm to implement the multiplication operation for unsigned data

# III. PROPOSED ARCHITECTURE

Multiplication Algorithm for 128 bit:

# INTERNATIONAL JOURNAL OF SCIENTIFIC PROGRESS AND RESEARCH (IJSPR) Volume-18, Number - 03, 2015

Let the product register size be 256 bits. Let the multiplicand registers size be 128 bits. Store the multiplier in the least significant half of the product register. Clear the most significant half of the product register.

Repeat the following steps for 128 times:

1. If the least significant bit of the product register is "1" then add the multiplicand to the most significant half of the product register.

2. Shift the content of the product register one bit to the right (ignore the shifted-out bit.)

3. Shift-in the carry bit into the most significant bit of the product register. Fig. 4. Shows a block diagram for such a multiplier [2].



Fig. 4 Two n-Bit Multiplier Design



Fig. 5 128-Bit Area, Delay Efficient BEC Based Adder

| Device utilization summary:         |     |        |      |             |     |
|-------------------------------------|-----|--------|------|-------------|-----|
| Selected Device : 6slx4tqg144-3     |     |        |      |             |     |
| Slice Logic Utilization:            |     |        |      |             |     |
| Number of Slice LUTs:               | 344 | out of | 2400 | <b>14</b> % |     |
| Number used as Logic:               | 344 | out of | 2400 | <b>14</b> % |     |
| Slice Logic Distribution:           |     |        |      |             |     |
| Number of LUT Flip Flop pairs used: | 344 |        |      |             |     |
| Number with an unused Flip Flop:    | 344 | out of | 344  | 100%        |     |
| Number with an unused LUT:          | 0   | out of | 344  | 0%          |     |
| Number of fully used LUT-FF pairs:  | 0   | out of | 344  | 0%          |     |
| Number of unique control sets:      | 0   |        |      |             |     |
| IO Utilization:                     |     |        |      |             |     |
| Number of IOs:                      | 386 |        |      |             |     |
| Number of bonded IOBs:              | 384 | out of | 102  | 376%        | (*) |

Fig. 6 Device Utilization Summary

## IV. SYNTHESIS RESULTS

The proposed 128-Bit BEC based multiplier architecture is shown in the previous section has better delay profile and the area required. The calculation of the delay and area is shown here. The implemented Advanced 128-Bit multiplier design is implemented on FPGA Spartan 6 board. The area count summary is given in Table I.

The Logic Cell counts are:

### *1-Logic Cell = 1.6 Slice LUTs*

Table I: Area Count Summary Comparison

| Area Utilization (Logic Cells) |      |           |             |  |  |  |  |
|--------------------------------|------|-----------|-------------|--|--|--|--|
| Logic Utilization              | Used | Available | Utilization |  |  |  |  |
| Our Work                       | 550  | 3840      | 14%         |  |  |  |  |
| Previous Work                  | 2696 | 3840      | 70%         |  |  |  |  |

### V. CONCLUSION AND FUTURE SCOPE

studied about different adders among compared them by different criteria like Area, Time and then Area-Delay Product etc. so that we can judge to know which adder was best suited for situation. After comparing all we came to a conclusion that Carry Select Adders are best suited for situations where Speed is the only criteria. Similarly Ripple Carry Adders are best suited for Low Power Applications. it is suitable for situations where both low power and fastness are a criteria such that we need a proper balance between both as is the case with our Project.

Multipliers are one the most important component of many systems. So we always need to find a better solution in case of multipliers. Our multipliers should always consume less power and cover less power. So through our project we try to determine which of the three algorithms works the best. In the end we determine that radix 4 modified booth algorithm works the best.

### REFERENCES

- K.H. Tsoi, P.H.W. Leong, "Mullet a parallel multiplier generator," fpl, pp.691-694, International Conference on Field Programmable Logic and Applications, 2005., 2005
- [2] S. Tahmasbi Oskuii, P. G. Kjeldsberg, and O. Gustafsson, "Transition activity aware design of reduction-stages for parallel multipliers," in Proc. 17th Great Lakes Symp. On VLSI, March 2007, pp. 120–125.
- [3] Ayman A. Fayed, Magdy A. Bayoumi, "A Novel Architecture for Low- Power Design of Parallel Multipliers," wvlsi,

pp.0149, IEEE Computer Society Workshop on VLSI 2001, 2001

- [4] M. 0. Lakshmanan, Alauddin Mohd Ali, "High Performance Parallel Multiplier Using Wallace-Booth Algorithm," IEEE International Conference on Semiconductor Electronics, pp. 433-436, 2002.
- [5] Design Compiler User Guide. Synopsys, Inc., Nov. 2000.
- [6] Z. Huang, "High-Level Optimization Techniques for Low-Power Multiplier Design," PhD dissertation, Univ. of California, Los Angeles, June 2003.
- [7] L. Sriraman and T. N. Prabakar, "Design and implementation of two variable multiplier using KCM and Vedic Mathematics," *Recent Advances in Information Technology* (*RAIT*), 2012 1st International Conference on, Dhanbad, 2012, pp. 782-787.
- [8] E. G. Walters, "24-bit significand multiplier for FPGA floating-point multiplication," 2015 49th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, 2015, pp. 717-721.
- [9] S. S. Bhairannawar, R. Rathan, K. B. Raja, K. R. Venugopal and L. M. Patnaik, "FPGA based Recursive Error Free Mitchell Log Multiplier for image Filters," *Computational Intelligence & Computing Research (ICCIC), 2012 IEEE International Conference on*, Coimbatore, 2012, pp. 1-5.
- [10] U. C. S. Pavan Kumar, A. Saiprasad Goud and A. Radhika, "FPGA implementation of high speed 8-bit Vedic multiplier using barrel shifter," *Energy Efficient Technologies for Sustainability (ICEETS), 2013 International Conference on*, Nagercoil, 2013, pp. 14-17.
- [11] M. J. Rao and S. Dubey, "A high speed and area efficient Booth recoded Wallace tree multiplier for fast arithmetic circuits," *Microelectronics and Electronics (PrimeAsia), 2012 Asia Pacific Conference on Postgraduate Research in,* Hyderabad, 2012, pp. 220-223.
- [12] Jinghua Li, Yuyuan Du and Jun Wang, "Design a pocket multi-bit multiplier in FPGA," ASIC, 1996., 2nd International Conference on, Shanghai, 1996, pp. 275-279.
- [13] P. Mehta and D. Gawali, "Conventional versus Vedic Mathematical Method for Hardware Implementation of a Multiplier," Advances in Computing, Control, & Telecommunication Technologies, 2009. ACT '09. International Conference on, Trivandrum, Kerala, 2009, pp. 640-642.