# Design and Analysis of Finite Impulse Response Using Gate Diffusion Input (GDI) Circuits

Mehdi Faed<sup>1</sup>, Mohammad Mortazavi<sup>2</sup>, Alireza Faed<sup>3</sup>

<sup>1</sup>Raja University, Department of Telecommunication and Electronic Engineering, Qazvin, Iran *Mehdi.Faed@raja.ac.ir* 

<sup>2</sup> Sharif University of Technology, International Campus, School of Science and Engineering, Electrical Engineering Department, Kish, Iran *Mortazavi@sharif.ir* 

> <sup>3</sup>Curtin University of Technology, School of Information System, Perth, Australia Alireza.Faed@postgrad.curtin.edu.au

Abstract: Integrated Circuits technology advancements have consistently migrated to smaller feature sizes over the last four decades years, forcing more functional circuits to be placed on each chip. The increased density can be used to decrease cost and/or increase functionality as can be seen by Moore's law. Since higher performance and lower power consumption are becoming attainable by not further following the Moore's law, there is an intensive competition among the manufacturers to use finer geometries. The new Gate-Diffusion-Input (GDI) based circuit is the focus of attention in designing digital circuit, less power is required while more efficiency is obtained. GDI-based circuit resembles to CMOS transistors but have fewer transistors with more functionality and higher performance capability. This study addresses the design and analysis of Finite-Impulse-Response (FIR) block which are done, using basic blocks implemented with GDI-based circuits. In this study, design of 8-tap FIR design based on GDI cells has been implemented and compared with NMOS based circuits, and CMOS based circuits. The results show 8-tap FIR design using GDI has lower power consumption, higher performance, and less die area.

*Keywords:* CMOS, NMOS, GDI, DSP, POWER CONSUMPTION, PROPAGATION DELAY, SLEW, FIR

# I. Introduction

FIR blocks are the most important blocks in design of Digital Signal Processing (DSP). They are widely used in industry and digital systems, such as: automotive, mobile phone, internet, laptop, computer, speech processing, Bluetooth headsets, and etc. The requirement to design an electronic system consists of two major components, the first one is Technologydriven and the second is Market-driven. Regarding technology driven, nowadays, most industries are improving their technology and devices considering greater complexity. It means more functionality, higher density in order to place millions of transistors on a lesser die area, increased performance and lower power dissipation. Due to the market demand, each novel issue must be taken seriously and quick actions have to be taken since missing the window's market can be very costly.

Taking a look at the literature of the GDI technology, one realizes that there has been neither research nor project carried out on GDI implemented in DSP block. Moreover, in the previous technologies, several transistors were needed in order to design basic blocks. In the present project, however, basic blocks can be designed using as few as two transistors. The advantages of GDI– namely, high performance or high speed, low power consumption, and low area occupation-lead the author of the present research to implement GDI in FIR filter. In order to design and implement basic

blocks and FIR filters, HSPICE program of TSMC 0.18µm CMOS Technology has been used.

This paper is organized as follows. In Section II, the basic principal of Gate Diffusion Input (GDI) and its related works that is similar to CMOS inverter has been described in detail. Moreover, system level design architecture of 8-tap digital FIR has been explained in section III. Section IV, the hardware implementation of basic components of FIR such as Latch, adder, and multiplier are shown. Furthermore, simulation results and analysis of the FIR design has been presented in section V. Finally, Section VI gives our conclusion remarks.

#### **II. Related work**

This research was conducted based on A. Morgenshtein et al [1-3] frame work. The main idea was to use similar to CMOS inverter design with a pull-up PMOS and a pull-down NMOS transistor such that connecting the bulk of PMOS to the source of the pull-up transistor and connecting bulk of NMOS to the drain of pull-down transistor with different settings to construct various basic function blocks. Moreover, the comparison was made between GDI-based blocks and NMOS/CMOS with respect to propagation delay, rise & fall time, and number of transistors. The results showed that GDI-based basic blocks consume less power, need fewer transistors, and have lower rise and fall time. Further, the authors presented Carry Look-ahead Adder (CLA), a novel GDI technique for low power design. They made an 8-bit CLA adder using GDI and CMOS. Existing Transmission Gate (TG) and N-PG technique were compared showing an up to 45% reduction of power delay product in that chip in GDI over CMOS and remarkable improvement in performance, as well as decreased number of transistors and area in most simulated GDI circuits over NMOS and Pass Transistor Logic (PTL). In all the previous works, GDI was used in combinational logic and sequential logic blocks but not in DSP blocks. A new enhancement was made to original GDI cell by defining five terminals to create primitive and arithmetic blocks. In [10], a novel design of a low power 1-bit full adder cell was proposed, where the GDI technique had been used for the simultaneous generation of XOR and XNOR functions. Simulation results, performed by HSPICE based on 0.18 µm CMOS technology, indicated that the new full adder circuit overtakes many latest designs in energy efficiency and has the lowest power-delay product over a wide range of voltages among several low-power adder cells of different CMOS logic styles.

In [13], new full adder circuits to decrease the power consumption were implemented based on GDI. According to transistor level simulations, the power consumption is decreased with at least 62% for the sense energy Recovery full adder (SERF) design and 86% for

the GDI (Gate diffusion input) full adder design. The cost is a small area overhead; the proposed circuits have a small area overhead up to 11% compared with SERF and GDI full adders. In addition, the GDI technique showed that this logic can be suitable for ultra low power applications. These sub threshold circuits are employed for ultra low power applications. While the proposed circuits have some area overhead that is negligible, they have at least 62% less power dissipation when compared with existing designs. Finally, in [14], the low power and high performance 1-bit full adder cell based on [2] frame work were designed. The aim of this work is two have a power reduction and speed increase in the full adder. In this operation the GDI technique was introduced. By using techniques such as size optimizing in full adder could reduce the power consumption. As a result, the full adder works at the 100 MHz speed with 0.78 uw power consumption. These results were obtained with spice simulation from the extracted net list of the layouts for normal parameters, room temperature and power supply at 1.8v.

## **III.FIR** filter

Digital filters are typically used to modify or alter the attributes of a signal in the time or frequency domain. Digital signal processing (DSP) has become a mature technology and has replaced traditional analog signal processing systems in many applications. Two important classes of filters based on impulse response type; namely, Finite Impulse Response (FIR) Infinite Impulse Response (IIR). FIR filters are one of two basic types of digital filters that are used in Digital Signal Processing (DSP) applications. They are filter structures that can be used to implement almost any type of frequency responses digitally. FIR filters are finite due to the fact that they have no feedback. Thus, if you send an impulse through the system (a single spike) then the output will unavoidably become zero as soon as the impulse runs through the filter. In such case, due to the lack of feedback in the FIR, the impulse response is finite. Lack of feedback guarantees the impulse response to be finite. Thus, the term "finite impulse response" may almost mean "lack of feedback". DSP filters can also be "Infinite Impulse Response" (IIR) which uses feedback. FIR and IIR have both advantages and disadvantages. By and large, the advantages of FIR filters outnumber its disadvantages, that's why they are used much more than IIRs.

## **IV. Basic definition of FIR**

FIR filters play very important roles in communication applications and signal processing. In other words, they are the main cores of a DSP processor. The present study aims at planning to use a eight-tap filter which leads to decreases in the circuit mass. In this case,

only m/2 of the coefficient must be stored in the memory. The result is that the memory needed to store the coefficient will decrease by half. The finite impulse response (FIR) filter is implemented in many digital signal processing (DSP) systems to conduct signal preconditioning, anti-aliasing, band selection, decimation/interpolation, low pass filtering, and video convolution functions. One of the most key elements for a DSP system is an FIR Filter which can be mathematically shown using different equations as follow. The basic structure of a FIR filter includes a series of multiplications followed by an addition. Consider a FIR filter operation can be represented by the following equation:

$$y[n] = x[n] * h[n] \tag{1}$$

Where x, h, and y are input signal, transfer function, and output signal respectively. A mathematical representation in direct form of FIR filter is shown as following equation:

$$y[n] = \sum_{i=0}^{i=M} b_i x[n-i] = \sum_{i=0}^{i=M} h[i] x[n-i]$$
(2)

Figure 1 shows the basic block diagram for an FIR filter structure of length N. The delays result in operating on prior input samples. The h<sub>N</sub> values are the coefficients used for multiplication, so that the output at time n is the summation of all the delay samples multiplied by the appropriate coefficients. As shown in Fig. 1, representation of a tapped delay line implementation of an FIR filter is described. Where x (n) shows the sequence of input samples, h (n) (impulse response) represents the filter coefficients and m is the number of taps. A sample FIR filter with L=8 is revealed in which eight samples of the input are used. Thus, it is called an 8-tap filter. Each of the registers gives a unit sample delay. The delayed inputs are multiplied by their respective filter coefficients and added together to produce the output. An FIR filter is usually implemented by using a series of digital hardware elements including delays, multipliers, and adders to create the filter's output. An FIR filter of order N has N+1 multiplier and N adders and N delays. In the next section, each component of one tap FIR filter will be explained in more detail.



Figure1: Signal flow graph for Direct form FIR Structure

#### V. Digital hardware implementation of FIR

The output of each register is called a tap and is represented by X [n-k], where (n-k) is the tap number, and each tap is multiplied by a coefficient h (n) and then all the outputs are added up. In the following lines, there are a few terms that are used to show the performance of FIR filter.

*Filter Coefficients*: The series of constants, also called tap weights, used to multiply against delayed sample values. For an FIR filter, the filter coefficients are, by definition, the impulse response of the filter.

*Impulse Response:* A set of FIR coefficients, which show all frequencies possible. A filter's time domain output sequence when the input is an impulse. An impulse is a single unity-valued sample preceded and preceded by zero-valued

samples. For an FIR filter the impulse response of a FIR filter is the set of filter coefficients.

*Tap:* A coefficient/delay pair. The number of FIR taps, typically N, gives us some information about the filter. Most importantly, it shows us the amount of memory which is needed, the number of calculations that are required, and the amount of "filtering" that it can do. Basically, the more taps in a filter lead to better stop band attenuation (less of the part we want filtered out), less rippling (less variations in the pass band), and steeper roll off (a shorter transition between the pass band and the stop band). In effect, more taps means more stop band attenuation, less ripple, narrower filters, etc.).

*Multiply and Accumulate* (MAC). In the context of FIR Filters, a "MAC" is the operation of multiplying a

coefficient by the corresponding delayed data sample and accumulating the result (see Figures 2 and 3). There is usually one MAC for each tap. Most DSP microprocessors implement the MAC operation in a single instruction cycle. The following terms may also be of help t o have a better understanding of the issue:

$$x[n] \longrightarrow w[n] = Ax[n]$$

Figure 2: Multiplier Unit

$$x[n] \longrightarrow w[n] = x[n] + y[n]$$

Figure 3: Adder Unit

*Delay Line:* The set of memory elements that implement the  $Z^{-1}$  delay elements of the FIR calculation. A single tap FIR consists of three basic components that is, sampled Unit Delay (see in Fig 4),

$$x[n] \longrightarrow x[n-1]$$

Figure 4: Sample Unit Delay

## A. Hardware implementation of one-tap FIR filter

Design of a Full Adder, a Latch, and a multiplier are shown in the following sections. The figure below shows the Full Adder circuit in CMOS. Pulse voltage is given to A, "0" or ground to B, "1" or 5volt to  $C_{IN}$ .

A basic cell in digital computing systems is the 1-bit full adder which has three 1-bit inputs (a, b, and c) and two 1-bit outputs (*sum* and *carry*). The relations between the inputs and the outputs are expressed as:

$$sum = (a \oplus b) \oplus c \tag{3}$$

$$carry = a.b + c.(a \oplus b) \tag{4}$$

The logic level structure and its critical path delay of Fulladder are shown in Figure 5 and 6.



Figure 5: Full Adder Structure



Figure.6: Critical Path Delay of Full Adder

In order to optimize the critical path that is from input A to output Carry (as shown in fig. 6) must be optimized before transformation to transistor level. The Full-adder Boolean expressions may be rearranged as:

$$sum = c (a+b+c) + a.b.c$$
(5)

$$carry = a.b + c.(a+b) \tag{6}$$

Several logic styles have been used in the past to design full adder cells. Each design style has its merits and demerits. In [9], a high performance 10T full adder cell has been presented using 10 transistors. The proposed cell has the advantages of low power consumption and high operating speed. Moreover, it occupies small area due to the small transistor count. The low power objective is achieved at the circuit level by reducing the number of internal node capacitances by eliminating direct paths between the supply voltage and the ground and by maintaining low switching activity in the circuit. A 14T Adder presented in [10] consumes considerably less power in the order of microwatts and has higher speed and reduces threshold loss problem compared to the previous different types of transistor adders.

The Complementary pass transistor logic (CPL) [11] refers to a CMOS type logic family, which is designed for for low power applications. The main concept behind the CPL is the use of only NMOS network for the implementation of logic functions and the elimination of PMOS latch. The use of only NMOS transistors will result in small input loads. CPL consists of complementary inputs and outputs and the pass transistors function as both pull-up and pull-down devices [12]. Since the voltage level of the pass transistor output is lower than the supply voltage by the threshold voltage of pass transistor, the signals have to be amplified and this amplification is done using CMOS inverters. The output inverters provide good output driving capability.

The basic difference between the pass transistor logic and the complementary CMOS logic styles is that the source side of the pass logic transistor network is connected to some input signals instead of the power lines .The advantage is that one pass transistor network (either pMOS or nMOS) is sufficient to implement the logic function which results in smaller number of transistors and smaller input load. However, pass transistor logic has an inherent threshold voltage drop problem. The output is a weak logic "1" when "1" is passed through an nMOS and is a weak logic "0" when "0" is passed through a pMOS. Therefore, output inverters are also used to ensure the drivability. The complementary CMOS full adder (CCMOS) [13] is based on the regular CMOS structure with conventional pull-up and pull-down transistors providing full swing output and good driving capabilities.

Other designs include transmission-function full adder (TFA) [14] and transmission-gate full adder (TGA) [15]. These designs are based on transmission-function theory and transmission gates, respectively. Transmission gate logic circuit is a special kind of pass transistor logic [14].

Other design of full adder is Hybrid full adder presented in [13]. Most of these designs are based on simultaneous generation of XOR and XNOR outputs that are passed onto either transmission-gates final stage or static CMOS final stage for the generation of Sum and Carry outputs.

#### a) Design of low-voltage GDI 1-bit fast full adder

Transistor level schematic of the proposed 1-bit fast full adder is shown in Fig. 7. It consists of 3 modules based on the GDI technique. The module  $M_1$  produces the generate, propagate, and XOR signals by given input signals A and B. The module  $M_2$  is the carry look-ahead (CLA) realization of the carry function. Given input carry (C\_n), generate (G) and propagate (P) signals, the circuit produces output carry. This circuit is small enough to be modeled as a single primitive. Finally, the module  $M_3$  produces the sum function by XOR signal generated by module  $M_1$  and input carry.

In the proposed full adder cell, needed functions are generated by circuits based on the GDI technique. These circuits robustness against voltage scaling and transistor sizing enables them to operate reliably at low voltage. Also, the output inverters guarantee sufficient drive to the cascaded cells.



Figure 7: Transistor level schematic of the GDI fast full adder cell

#### b) D-Latch

D-Latch is used for unit sampled delay. One of the most important state-holding elements is D-Flip-Flop. Considering the eight-tap filter mentioned above, in the present designed implementation, each coefficient has four bits and there are only four coefficients because the filter is taken with linear phase. To design the memory, an 8-bit register is used each cell of which is a D-Flip-Flop. The circuit structure of this D-Flip-Flop in gate level is shown in Fig 8.



Figure 8: D-Type Transparent Latch Structure

The critical path delay is the worst (maximum) delay a data transferred from CLK pin to the output Q pin as shown in Figure 9.



Figure 9: Critical Path Delay of D-Latch

In practice, coefficient and system inputs have to be stored in the same memory because the computations are done in real time. In sequential circuits latches are not used. That is because; on the memory the output equals the input when the clock pulse moves from '0' state to '1' state. The output changes in latch when the clock pulse moves from

'0' to '1' or vice versa. As shown in Figure 10, in the circuit used in this study, the output must be equal the input. In order to coordinate the whole of circuits' structure, the input equals the output only when the clock pulse moves from 0 to 1. Sequential circuits are those in which memory elements are used.



Figure 10 :D Latch using GDI

Definitions of Gate, latch, Flip-Flop, and Register Gate, built directly from transistors, are the building block of combinational circuits. Latch, made up of some gates, is transparent when internal memory is being set. It stores data when clock is low. In other words, it is level triggered, level sensitive, or level dependent and doesn't takes much area. It is good to note that latches are faster than Flip-Flop. Flip-Flop is made up of some latches. It is not transparent and unlike the latches, stores the data when clock is high. It is sensitive to clock transition and takes much more area. If latch is used in Master-Slave way, it forms Flip-Flop. Register, is a group of Flip-Flop with each Flip-Flop capable of storing one bit of information. Flip-Flop is controlled by common write enable or clock. In other words, it is edgetriggered.

#### c) Multiplier

In the present implemented structure, a four-bit array multiplier has been used. Due to one-to-one mapping between topological correspondence between array multiplier hardware structure and the manual multiplication, it is easier for implementation. Although, there are other types of multiplier designs which is more efficient in terms of performance and power, however we chose an array multiplier for its simple structure. The generation of N partial products requires N\*M two bit AND gates. Most of the area of the multiplier is devoted to the adding of the N partial products, which requires N-1 M-bit adders. However, our focus is to compare GDI with CMOS structure [18].

The critical path delay is shown in Fig, 11 as dashed-line. That is from X1 or Y0 input to P6 output pin. Delay in critical path mode for a 4x4 multiplier is calculated approximately through the following equation:

$$T_{Mult} = ((M-1) + (N-2))T_{Carry} + (N-1)T_{sum} + T_{and}$$
 (7)

Where  $T_{Carry}$  is the propagation delay between input and output carry,  $T_{sum}$  is the delay between the input carry and sum bit of full adder ,and  $T_{and}$  is the delay of the AND gate.





#### A. 8-Tap FIR design

Figure 12 depicts the block of Direct form FIR (Finite Impulse Response) Filter and each tap includes D-Latch, Multiplier and Full Adder-Latch involve, 4-bit input and 4-bit output .the Multiplier contain two 4-bit input and 8-bit output. In this project 8 sequential tap have been used and the output of each D-Latch and Adder is the input the following tap.



Figure 12: Block Diagram of Direct form FIR

#### VI. Simulation results

In this work, we have simulated used Wn= 0.22um and Ln= 0.18assume W and L values given to all transistors, either in CMOS or GDI mode, are 0.18u and 0.22u respectively=0.18u and W=0.22u are the values given in buffer circuits. Because  $V_{Carry}$  output wave in full adder in GDI mode has decreased, and some an input value, or 5v, has decreased, a buffer circuit must be situated in  $V_{Carry}$  output in which L and W in their transistors are 0.18u and

0.22u respectively. In the same way, in designing Full Adder in NMOS mode, a buffer circuit must be added to V<sub>Sum</sub> because V<sub>Sum</sub> has fallen and it is less than input voltage of 5 v. L and W in the transistor are 0.18u and 0.22u in the order mentioned. In order to compute the worst case in full adder circuit, some test cases must be conducted with number of conditions.

The first test case is the AND gate in which the output is X in the condition that one input is 1 ( $V_{CC}$  or 5v) and the other is X. W equation can be represented by the following equation:

$$W_p = 3^* W_n \tag{8}$$

The following subsection describes the concept of controlling and non-controlling inputs to remove any false paths in the design.

#### A. Controlling and non-controlling

To eliminate false paths that are paths in a design which are functionally never exist such as paths between any two asynchronous clocks. The designer knows such path can never be true. Hence, designer should apply values of 0 or 1 to controlling and non-controlling inputs to eliminate false paths. If such paths are not correctly handled, the tools may waste resources/runtimes in resolving such paths and may never workout the functionally correct paths and may lead to violations.

The proposed 1-bit full adder cell designed based on the GDI technique and seven kinds of CMOS based cells have been simulated with HSPICE [16] in 0.18 µm TSMC CMOS process [17] at 1.8 V supply voltage and 100 MHz frequency [18]. The snapshot of waveforms of the proposed full adder cell at 1.8 V is shown in Fig. 13.





Figure 13: HSPICE simulation results of the test circuit for full adder of waveforms at 1.8V and 100MHz. Inputs A, B, Cin are the (a), (b) and (c), respectively. Cout and Sum for GDI full adder without buffer are (d) and (e), and finally Cout and Sum for GDI full adder with buffer are (f) and (g).

All full adder cells are laid out with optimized sizing and spacing in compliance to the design rules of TSMC 0.18 µm CMOS process. Simulation results show that the 10T adder cell fails to function at low voltage. The lowest voltage that it can work at 100MHz is 1.8 V. The speed of the hybrid, 10T, and 14T decreases faster with the supply voltage than other adder cells. The layout of the proposed 1-bit CMOS fast full adder is presented in Fig. 14.

#### A. Bit Adder

Propagation time and transition time from A voltage to Sum and from A voltage to Ccarry, value of Rise Time and Fall Time and Propagation Delay(Rise to Rise, Rise to Fall, Fall to Rise, Fall to Fall) in Full Adder are calculated .

Table 1 shows the propagation delay from input a to sum and carry outputs moreover table 2 shows the average power and peck power of 8 bit adder.

| TABLE I    | TABLE I AVG. Propagation Delay 8-Bit Adder |              |  |  |  |  |  |  |
|------------|--------------------------------------------|--------------|--|--|--|--|--|--|
| Technology | Delay A->Cany                              | Delay A->Sum |  |  |  |  |  |  |
| CMOS       | 26.96                                      | 24.8         |  |  |  |  |  |  |
| NMOS       | 26.72                                      | 25.6         |  |  |  |  |  |  |
| GDI        | 25.12                                      | 27.36        |  |  |  |  |  |  |

| TABLE 1 AVG. Propag | gation Dela | y 8-Bit Addeı |
|---------------------|-------------|---------------|
|---------------------|-------------|---------------|

TABLE 2 AVG. Powers and Peak Power of 8Bit Adder

| Technology | AVG. Power[mw] | Peak Power[mw] |
|------------|----------------|----------------|
| CMOS       | 6.08           | 76             |
| NMOS       | 13.6           | 28.8           |
| GDI        | 7.76           | 51.2           |



Figure 14: Proposed 1-bit CMOS fast full adder layout.

## B. 4x4 Multiplier

Delay in critical path mode for multiplier is calculated approximately through the equation from Equation 7 ( $T_{mult}$ ), for 4X4 Multiplier, M=N=4 and the propagation delays from A->Sum and A->carry of full adder was measured in previous section.

As it been observed from Table  $T_{mult}$ , the critical path delay for GDI performs better than CMOS and NMOS. The results of  $T_{MULTILIER}$  are shown in table 3.

TABLE 3 T<sub>MULTIPLIER</sub> of CMOS, NMOS and GDI-4X4 Bit

| Technology | T Multiplier [ns] |
|------------|-------------------|
| CMOS       | 29.45             |
| NMOS       | 30.2              |
| GDI        | 28.9              |

Accurate design of area in LEDIT program could not be done due to time shortage. Area, in CMOS, NMOS, and GDI were approximately compared according to the approximate L\*W equation, as shown below:

TABLE 4 The Number of Transistors and Area

|                | CMOS |                | NMOS |          | GDI  |          |  |
|----------------|------|----------------|------|----------|------|----------|--|
|                | # Tr | # Tr Area # Tr |      | Area     | # Tr | Area     |  |
|                |      | $(mm^2)$       |      | $(mm^2)$ |      | $(mm^2)$ |  |
| Full dder      | 54   | 4.2            | 29   | 1.9      | 22   | 1.7      |  |
| 8-bit Adder    | 432  | 33.6           | 232  | 15.2     | 176  | 13.6     |  |
| 1-bit D-Latch  | 18   | 1.42           | 12   | 0.47     | 16   | 1.26     |  |
| 4 -bit D-Latch | 72   | 5.68           | 48   | 1.88     | 64   | 5.04     |  |
| 4- Bit Mult    | 584  | 46.24          | 460  | 23.44    | 416  | 16.47    |  |

## C. 4- Bit D-Latches

The following table shows the result of the 4-Bit D-Latch in terms of Propagation Delay (ns), Average Power (mw), and Peak Power (mw)

TABLE 5 Powers and Propagation Delay D-Latch of 4 Bit

| Tashnalagu | Propagation | Average   | Peak      |
|------------|-------------|-----------|-----------|
| Technology | Delay(ns)   | Power(mw) | Power(mw) |
| CMOS       | CMOS 11.24  |           | 6.36      |
| NMOS 11.56 |             | 3.08      | 4.04      |
| GDI 11.96  |             | 2.96      | 10.48     |

## VII. Conclusion

In this research, 8-tap digital FIR filter has been designed and implemented. The basic cells such as adder, D-latch, and multiplier were implemented based on Gate-Diffusion Input circuits. The simulation results shows that the layout area, propagation delay and power consumption were lower compared to NMOS and CMOS technology. The following tables show the improvement metrics percentage of GDI over CMOS in terms of propagation delay, number of transistor and total power that are shown as below:

| Component  | CMOS  | GDI   | Percentage (%) |
|------------|-------|-------|----------------|
| 8 Bit Full | 26.96 | 25.12 | ↓6.8           |
| Adder      |       |       |                |
| 4x4 Bit    | 29.45 | 28.9  | ↓1.7           |
| Multiplier |       |       |                |
| 4 Bit D-   | 11.24 | 11.96 | ↑6.4           |
| Latch      |       |       |                |
| 8 Tap FIR  | 541.2 | 527.8 | ↓2.4           |
| Filter     |       |       |                |

 TABLE 6 Propagation Delay Improvements GDI over CMOS

| TABLE | 7 | Transistor | count | Improvement | of | GDI | over |
|-------|---|------------|-------|-------------|----|-----|------|
| CMOS  |   |            |       |             |    |     |      |

| Number of Transistor              |      |     |       |  |  |  |  |
|-----------------------------------|------|-----|-------|--|--|--|--|
| Component CMOS GDI Percentage (%) |      |     |       |  |  |  |  |
| 8 Bit Full                        | 432  | 176 | ↓59.2 |  |  |  |  |
| Adder                             |      |     |       |  |  |  |  |
| 4x4 Bit                           | 584  | 416 | ↓28.8 |  |  |  |  |
| Multiplier                        |      |     |       |  |  |  |  |
| 4 Bit D-Latch                     | 72   | 64  | ↓11.2 |  |  |  |  |
| 8 Tap FIR                         | 8704 | 528 | ↓39.7 |  |  |  |  |
| Filter                            |      |     |       |  |  |  |  |

TABLE 8 Total Power Improvements GDI over CMOS

| Total Power[mw]=P <sub>D</sub> +P <sub>S</sub> |      |      |                |  |  |  |
|------------------------------------------------|------|------|----------------|--|--|--|
| Component CMOS GDI Percentage (%)              |      |      |                |  |  |  |
| 8 Bit Full                                     | 6.08 | 7.76 | <u>↑</u> 21.64 |  |  |  |
| Adder                                          |      |      |                |  |  |  |
| 4 Bit D-Latch                                  | 1.76 | 2.96 | <u></u> ↑68.16 |  |  |  |

Based on the finding from literature and the output derived from our analysis in this research, GDI technology has a wide range of applications. Likewise, GDI was used in one of the DSP applications in which it is called FIR. We have implemented 8-taps digital FIR filter using CMOS, NMOS, and GDI techniques. Furthermore, layout or area, power, worst-case delay, and timing have been taken into account. The simulation has been conducted utilizing HSPICE V.A.2008.03. Ultimately, the TSMC 0.18micron technology has been employed to measure gates and transistors.

## References

- J.M. Rabaey, A. Chandrakasan, B. Nikolic, "Digital Integrated Circuits", 2nd edition, Prentice Hall, 2002, pp 491-621.
- [2] N. Weste and K. Eshraghian, Principles of CMOS digital design. Reading, MA: Addison-Wesley, pp. 304–307.
- [3] Morgenshtein, A. Fish, I.A. Wagner, "Gate-Diffusion Input (GDI) A Novel Power Efficient Method for Digital Circuits: A Detailed Methodology", Proc. of 14th IEEE International ASIC/SOC Conference, pp. 39-43, USA, Sept. 2001.
- [4] Padmanabhan Balasubramanian and Johince John" Low Power Digital Design Using Modified GDI Method" IEEE 2006,pp 190-193.

- [5] Morgenshtein, A. Fish, I.A. Wagner, "Gate-Diffusion Input (GDI) A Technique for Low Power Design of Digital Circuits: Analysis and Characterization", Proc. of ISCAS'02 Conference, vol. 1, pp. 477-480, USA, May 2002.
- [6] Morgenshtein, A. Fish, I.A. Wagner, "Gate-Diffusion Input (GDI) A Power Efficient Method for Digital Combinatorial Circuits", IEEE Transactions on Very Large Scale Integration Systems, vol. 10, no. 5, pp. 566-581, October 2002.
- [7] Morgenshtein, A. Fish, I.A. Wagner, "GDI (Gate Diffusion Input) Circuits for Fast and Low-Power Implementations of Logical Gates", US Patent Application no 60/406,751, 2003.
- [8] Morgenshtein, M. Moreinis and R. Ginosar, "Asynchronous Gate-Diffusion-Input (GDI) Circuits", IEEE Transactions on Very Large Scale Integration Systems, vol. 12, no. 8, pp. 847- 856, August 2004.
- [9] H. T. Bui, Y. Wang, and Y. Jiang, "Design and analysis of low-power 10-transistor full adders using novel XOR-XNOR gates," IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 49, no. 1, pp. 25–30, Jan. 2002.
- [10] M. Vesterbacka, "A 14-transistor CMOS full adder with full voltage swing nodes," in Proc. IEEE Workshop Signal Processing Systems, Oct. 1999, pp. 713–722.
- [11] A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design. Norwell, MA: Kluwer, 1995.
- [12] K. Yano et al., "A 3.8 ns CMOS 16x16 multiplier using complimentary pass-transistor logic," IEEE J. Solid-State Circuits, vol. 25, no. 2, pp. 388–395, Apr. 1990.
- [13] Chip-Hong Chang, Jiangmin Gu, and Mingyan Zhang, "A Review of 0.18-\_m Full Adder Performances for Tree Structured Arithmetic Circuits," IEEE Trans. On Very Large Scale Integration (VLSI) Systems, Vol. 13, No. 6, June 2005, pp. 686–695.
- [14] N. Zhuang and H. Hu, "A new design of the CMOS full adder," IEEE J. Solid-State Circuits, vol. 27, no. 5, pp. 840–844, May 1992.
- [15] C. Chang, J. Gu, M. Zhang, "Ultra Low-Voltage Low- Power CMOS 4-2 and 5-2 Compressors for Fast Arithmetic Circuits," IEEE Transactions on Circuits & Systems, Vol. 51, No. 10, pp. 1985-1997, Oct. 2004.
- [16] Fish, A. Morgenshtein, "Gate-Diffusion Input (GDI) Technology for Area-Efficient Digital Circuits Implementations in Standard Fabrication Technologies", US Patent Application, 2005, pp 27-28.
- [17] Frank Wanlass "CMOS Technology: Challenges for Future Development"IEEE Spectrum, p.44(May 1991)its
- [18] A.Saberkari,S.B.Shokouhi"A Novel Low- Power low-voltage CMOS 1-BIT Full Adder Cell with the GDI Technique"proceedings of the 2006 IJME,pp 2.
- [19] Jan M.Rabaey, A.Chandrakasan, and B.Nikolic .Digital Integrated Circuit. Prentice Hall Publications: Inadi, 2006, Chapter 11. Page 590
- [20] F.Moradi.Dag.T.Wisland,H.Mahmoodi,S.Aunet,T.Vucao,A.Peiravi "Ultra Low Power Full Adder Topologies" San Francisco State Univ.2009 IEEE,pp 3160.
- [21] A.Bazzazi,B.Eskafi"Design and Implementation of Full Adder Cell With the GDI Technique Based on 0.18µm CMOS Technology"IMECS 2010,pp4-5.
- [22] K.K. Chaddha, R. Chandel," Design and Analysis of a Modified Low Power CMOS Full Adder Using Gate-Diffusion Input Technique," Journal of Low Power Electronics, Volume 6, Number 4, American Science Publishers, December 2010, pp. 482-490.
- [23] AK Agrawal, S Wairya, RK Nagaria, "A New Mixed Gate Diffusion Input Full Adder Topology for High Speed Low Power Digital Circuits". World Applied Sciences Journal 7, IDOSI Publications, 2009, pp. 138-144.
- [24] M. Alioto, G. Di Cataldo, G. Palumbo, "Mixed Full Adder topologies for high-performance low-power arithmetic circuits", Microelectronics Journal, Vol 38, Issue 1, January 2007, pp. 130-139

# **Author Biographies**



**First Author: Mehdi Faed,** has received his BSc and MSc Degrees in Electrical Engineering from Azad University (2006) and Sharif University of Technology (2010), respectively. He has been teaching in variety of Electrical Engineering courses for BSc scholars. His main interests are in area of

Hardware, Digital Signal Processing, VLSI and their applications. Also, he has a broad technical experience in mobile industry. He is a member of Institute of Electrical and Electronics Engineers (IEEE) Society. Correspondence Author:

correspondence riddio

mehdi.faed@yahoo.com mehdi.faed@raja.ac.ir



#### Second Author: Mohammad Mortazavi

is currently a faculty member of School of Science and Engineering at Sharif University of Technology, International Kish Campus. He received a BS, MS, and PhD degrees in Electrical Engineering from State University of New York at Binghamton in 1989, 1992, 1995 respectively. Dr. Mortazavi worked as an R&D in different business units of Cadence Design Systems, Inc. between 1995 and 2005. His research interests are in the area of computer-aided design of integrated circuits, VLSI, timing, FPGA design and methodology. He has co-authored several papers on timing design verification and validation. Dr Mortazavi is a member of the Institute of Electrical and Electronics Engineers (IEEE).

mortazavi@sharif.ir



Third Author: Alireza Faed is a doctoral student at Curtin University Of Technology. He has strong interests, in e-Marketing, Information Technology, e-Sports Marketing, Sport

Sponsorship, Supply Chain Management, e-Commerce and m-Commerce, SMM and CRM. He used to serve as a lecturer in Iran. He is a member of IEEE and IEEE Intelligent Transportation Systems Society. He has also contributed to the International Journal of Research and Innovation", "International Journal" of Information Processing Management", "World Academy of Science, Engineering and Technology" and "Springer". Likewise, he has acted as a reviewer in 3GPCIC conference 2010 in Japan.

alireza.faed@postgrad.curtin.edu.au