Reduction in Flip Flops by Using MTCMOS and ULP Switch

Leakage Power

Huazhong Yang Department of Electronic Engineering Tsinghua University Beijing, China [email protected]

Saihua Lin Department of Electronic Engineering Tsinghua University Beijing, China [email protected] Abstract-As feature size of the CMOS technology continues to scale down, leakage power has become an ever-increasing important part of the total power consumption of a chip. By analyzing the leakage path of flip flops, we propose a method to reduce the leakage power of flip flops in this paper. Experimental results show that the leakage power of the proposed flip flop can be reduced by an average of 72.35% and 21.88% in standby mode and in active mode respectively while the delay time stays the same and the expense of area is small. I.

M IP DCOII. Finally, in section IV, we make a conclusion of this paper.

Q QB

D

INTRODUCTION

With the scaling of CMOS technology, leakage power is expected to become a significant portion of the total power consumption in future CMOS systems. Previously, most of the techniques, such as MTCMOS [1], and reverse body bias [2], are focused on the leakage reduction of combinational logics, whereas in this paper, we try to reduce the leakage power in sequential logics, such as flip flops. Due to the tighter timing constraints and critical performances of digital systems, new flip-flop families have been developed and integrated in high performance microprocessors. Among them, IP DCO shown in Fig. 1 is assumed to be the fastest [3] and has a large amount of negative setup time. However, the characteristic of high power consumption limits its application in low power integrated circuits design. Hence in this paper, we first analyze the leakage path existed in this flip flop and then, we try to use MTCMOS and ULP switch techniques to reduce the leakage power. Experimental results show that the leakage power of the proposed flip flop can be reduced by an average of 72.35% and 21.88% in standby mode and in active mode respectively while the delay time stays the same and the area penalty is very small. The rest of the paper is organized as follows. In section II, we analyze the leakage path of IP DCO and propose the new flip flop M IP_DCOI. In section III, we provide the experimental results of these two flip flops and further propose an implicit conditional discharge flip flop

1-4244-0173-9/06/$20.00 ©2006 IEEE.

Figure 1. IP_DCO II.

IP_DCO ANALYSIS AND IMPROVEMENT

Analysis From the BSIM MOS transistor model [4], the subthreshold leakage current can be given by:

A.

IDS where

=

l0eV

VGS- Vth 0 + rVBS + 7VDSJ n VT

V1,

)(l e- VDS /VT )

(1)

VT=kTIq is the thermal voltage, VGS VDS and ,

,

VBS are the gate-to-source, the drain-to-source, and the bulkto-source voltages, respectively. y and 77 are the body effect

and DIBL coefficients, respectively. n is the sub-threshold slope coefficient, and IO = goCO W /Lff L VT2e'8 From (1), we can find although a transistor is "off', there still exists current flowing through this transistor which results in the leakage power. .

21

Aside from the sub-threshold leakage current in standby mode, in active mode there still exists some leakage paths. Fig. 2 shows the keeper in IP DCO. When node a is assumed to perform low to high transition, node b will perform high to low transition after the delay time of an inverter. Hence, before node b is stabilized as the low voltage 0, NO transistor is still turned on. As a result, when node a is charged, a leakage path will exist and cause extra power consumption. Similar analysis can be done when node a is assumed to perform high to low transition. 0-

aba leak

\NO

Figure 3. M_IP_DCOI

b b -1

-1

Figure 2. Keeper Analysis in IP_DCO

B. Improvement By analyzing the leakage paths of IP DCO in both active mode and in standby mode, we propose a new circuit to reduce the leakage power, as shown in Fig. 3. The inner keeper and the output keeper is simplified and improved for leakage reduction. For example, consider the case that Q in Fig. 3 is assumed to perform low to high transition. After clk transits from low to high, node xin is discharged quickly. As a result, MI is turned on while N7 is turned off which results in the leakage power reduction in active mode. In order to reduce the leakage power in standby mode, we use high-Vth transistors. For example, the low-Vth threeinverter chain is replaced with high-Vth three-inverter chain. Another technique taken to reduce the leakage power is to use ULP switch, as shown in Fig. 4. In [5], the authors proposed a ULP latch to reduce the standby power of CMOS flip flops in SOI technology. However, this technique is difficult to implement in a CMOS technology because of the effect of the bulk. Thus in this paper, we modified this technique and propose a ULP switch technique. Similar results have been obtained in [6]. By inserting the ULP switch in the inverter, the leakage power of the inverter can be reduced since in standby mode, both the NMOS transistor and PMOS transistor operate in nearly cutoff region. Fig. 5 shows the effectiveness of the ULP switch when considering a minimum sized inverter. The input of the inverter is set to zero. We can find that as the power supply voltage increases, more leakage power can be reduced. Furthermore, by adjusting the sizes of the ULP switch, we can change the total delay of the inverter chain very easily.

I-ligh Vth

Low Vth

With ULP Switch

F

WithoutIt ULP Switch

Figure 4. Symbol definition

1200

w/o ULP Switch with ULP Switch

1000

a)0

800

0~n

600

a) 0)

a1)

400

-i

200

C.-

1

1.5

2 2.5 Power Supply (V)

3

Figure 5. Comparison of leakage powers with and without ULP switch of a minimum sized inverter.

RESULTS In this section, we first compare the leakage power of IP_DCO and M_IP_DCOI in standby mode and then, we compare the power and delay time in active mode. Finally, III.

22

we propose a new implicit conditional discharge flip flop at the end of this section.

We further experiment the power consumption when different input data patterns are applied. PatternO represents clk = 100 MHz and D = 20 MHz which means the typical A. Standby Mode Leakage Power Comparision case and the internal node xin has redundant switching The IP DCO and M IP DCO are implemented in a 0..18 activity. Patternl represents clk = 100 MHz and D = 500 gm CMOS technology and simulated using HSPICE. Two MHz. Pattern2 represents clk= 100 MHz and D = 100 MHz, which means the switching activity of data is comparable to threshold voltages (Vth) are available both for NMOS clock. Pattern3 represents clk = 100 MHz and D = 0, which transistors and PMOS transistors. High speed NMOS means the power is mainly due to the inverter chain. (PMOS) transistors feature Vth= 0.1646 V (- 0.2253 V). Low leakage NMOS (PMOS) transistors feature Vth= 0.3075 V (Pattern4 represents clk = 100 MHz and D = 1, which means 0.4555 V). The minimum length and width for high speed the power is mainly due to the switching activity of internal NMOS/PMOS transistors are 0.24 tm and 0.24 tm node xin. Table III and Fig. 6 show the power comparison respectively. The minimum length and width for low leakage of these five cases. NMOS/PMOS transistors are 0.18 tm and 0.24 tm From these results, we can find the new flip flop gain an respectively. Thus, although M IP DCOI has more average power reduction of 21.88% compared to the transistors than IP_DCO, the total L, W products of the two original one. The minimum power reduction occurs when circuits are still comparable (2.7264 giM2 and 2.8912 gmi2) the input data is constant zero. In this case, the power is when these two circuits are optimized to have the same datamainly due to the inverter chain of IP DCO or M_IP_DCOI. to-Q (D-Q) time. We can find that it amounts to more than one third of the total power consumption. Therefore, we can shorten the Table I shows the leakage power comparison of the two inverter chain to minimize the total power consumption as circuits in standby mode. Since when clk is high, the voltage long as the function is not affected. xin of the internal node can be high or low, we assume (1) represents the case when xin is high while (0) represents the case when xin is low in Table I. We can find that the TABLE III. POWER COMPARISON IN DIFFERENT PATTERNS proposed method is very efficient and the leakage is reduced by an average of 72.35% compared to the original one. DFF PattemO Patteml Patter2 Patter3 Patter4

(Atw)

D 0

Cik PDCO (nW) Vdd 182.0763 320.1770

M_IP_DCOI (nW) 83.5982 59.2497

Vdd

Vdd 182.0764 320.1770

83.5983

Vdd

(1)

(0)

(1)

(0)

(1)

(1)

493.3686 O 0 0 317.7729 Minimnum leakage power reduction Average Leakage power reduction

71.6789 71.6528 54.09% 72.35%

(0)

(0)

Original Proposed

(fF) 20 20

D-

D-

Q(lh) Q(hl) (ps) (ps) 106 105

92 92

Power

(GW)

16.57 12.10

(Atw)

5.41 5.28 2.40% 21.88%

(Atw)

18.16 12.96

20

15

TABLE II. DELAY AND POWER COMPARISON AFTER OPTIMIZATION

CL

(Atw)

59.2497

B. Active Mode Power Comparision We first optimize the two circuits so that they have the same D-Q time and then, we compare the power consumptions of these two circuits. The delay time is defined as Maximum (D-Q(hl), D-Q(lh)) and PDP is defined as the product of delay time and power. We can find M IP_DCOI exhibits about 26% better performance in terms of PDP, as shown in Table II.

DFF

(Atw)

19.54 Original 15.92 14.43 Proposed 11.90 Minimum Power Reduction Average Power Reduction

TABLE I. LEAKAGE POWER COMPARISON IN STANDBY MODE

PDP

(fJ)

15.92 T 1.688 11.90 T 1.250

I-,L-10

a) 0

n~

PatternO Patternl Pattern2 Pattern3 Pattern4 Figure 6. Power consumption comparison in different patterns

Fig. 7 and Fig. 8 show behavior of both the IP DCO and the M CP DCOI when the supply voltages are changed. The delay times of the two circuits are comparable when the

23

supply voltage is small. However, when the supply voltage is high, the proposed flip flop shows more improved characteristics over the original one as shown in Fig. 6. From Fig. 8, we can find the proposed flip flop is more power efficient. The PDP of the proposed flip flop is much smaller than that of the original one in the whole supply voltage span. The minimum and average PDP of the proposed flip flop are 24.89% and 25.96% less than that of the original one respectively. 130

25 20

15

0~ n

10

original --E-proposed G

12

original proposed

5 0O

,-

50

100 CL (fF)

110

0-

150

200

Figure 9. PDP vs. capacitance load

n 100

90

80 1.5

-l,

2.5

2 Vdd (V)

3

Figure 7. Delay vs. Supply Voltage

4.5 4

original

switching,

we

can

apply

conditional

method,

such

paper, we propose an

implicit conditional

discharge flip flop as shown in Fig. 10. Previously, the authors are focused on the explicit conditional discharge flip flop

3

and exclude the

power

of

pulse generator

for

power

comparison [7]. However, it is found that the pulse generator can consume considerable power and thus make explicit style of the flip flops no superior to the implicit style [3]. What's more, if we want to use that style of the flip flops we have to design a pulse generator first, which is not very convenient. Therefore, in this paper, we propose the implicit conditional discharge flip flop.

2.5

2

1.5

1E 0.5 1.5

as

conditional pre-charge method and conditional discharge method [7]. However, when applying pre-charge method, the voltage of node xin is kept 0 if D is constant high and thus, if Q is assumed to perform high to low transition, xin has to be charged to high value first and then Q might drop. As a result, the high-to-low D-Q time is much larger than that of low-to-high D-Q time, which makes it worse than the original IP DCO. Hence in this

proposed

3.5

n

C. Further Discussion Both IP DCO and M_IP_DCOI have the problems that the internal node xin will be pre-charged redundantly even when D is constant high. In order to eliminate this redundant

2.5

2

3

Vdd (V) Figure 8. PDP vs. Supply voltage

We also examine the behavior of both the IP DCO and the M CP DCO when the load capacitance is changed. We can still find the proposed flip flop is superior to the original one, as shown in Fig. 9. Figure

10.

M_IP_DCOII

24

The drawback of M_IP_DCOII compared to IP_DCO is that more transistors are used and hence the chip area is increased. Another drawback is that the delay time of M IP DCOII is larger than that of IP DCO because there are three transistors in the discharge path of the second stage. Besides, there still exist some glitches in M IP DCOII. However, due to the reduction of total power, the PDP is still much smaller than that of IP DCO. For example, the PDP of IP DCO is only 1.009 fl while the PDP of M_IP_DCOII is 0.653 fJ, less than 35.3% of the IP_DCO.

Figure 11. Waveform of IP_DCO

IV. CONSICLUSION In this paper, we first analyze the leakage power consumption in the IP DCO, which is one of the fastest flip flops among all kinds of flip flops. Then, we propose a new flip flop M IP DCOI with MTCMOS and ULP switch to reduce the leakage power of the flip flop. Experimental results show that the proposed flip flop behaves better than the original one even when the supply voltage is changed and the load capacitance is changed, whereas the expense of area is very small. The leakage power of the proposed flip flop can be reduced by an average of 72.35% and 21.88% in standby mode and in active mode respectively. Finally, at the end of the paper, we propose a new implicit conditional discharge flip flop to reduce the redundant switching activity of the internal node xin. It is still found that considerable power can be saved when the data input is constant high. REFERENCES [1] [2] [3]

[4]

Figure 12. Waveform of M_IP_DCOII

Fig. 11 and Fig. 12 show the differences between these two different styles of flip flips. We can find that when D is high, the internal switching activity of M IP_DCOII is less than that of IP DCO. We can also find that the glitches of M IP_DCOII are still less than that of IP DCO. Due to these reductions, the power of M IP_DCOII is only 4.296 tW in this case whereas the power of IP DCO is 9.515 tW, more than two times of that of the M IP DCOII.

[5]

[6]

[7] [8]

S. Shigematsu et al., "A 1-V high-speed MTCMOS circuit scheme for power-down applications," in Proc. IEEE Symp. VLSI Circuits Dig. Tech. Papers, 1995, pp. 125-126. T. kobayashi and T. Sakurai, "Self-adjusting threshold-voltage scheme(SATS) for low-voltage high-speed operation," in Proc. IEEE Custom Integrated Circuits Conf, 1994, pp.271-274. J. Tschanz, et al., "Comparative delay and energy of single edgetriggered & dual edge-triggered pulsed flip-flops for highperformance microprocessors," in Proc. ISLPED'01, Huntington Beach, CA, Aug. 2001, pp.207-212. B. J. Sheu, et al., "BSIM: Berkeley short-channel IGFET model for MOS transistors," IEEE J. Solid-State Circuits, vol. 22, pp. 558-566, Aug. 1987. David Levacq, Vincent Dessard, and Denis Flandre, "Ultra-low power flip-flops for MTCMOS circuits," in Proc. ISCAS'05, pp. 4681-4684. Narender hanchate, Nagarajan Ranganathan, "LECTOR: A techniuqe for leakage reduction in CMOS circuits," IEEE Trans. VLSI Syst., vol.. 12, no. 2, Feb. 2004, pp. 196-205. Peiyi Zhao, Tarek k. Darwish, and Magdy A. Bayoumi, "Highperformance and low-power conditional discharge flip-flop," IEEE Trans. VLSI Syst., vol. 12, no. 5, May, 2004, pp. 477-484. Vladimir Stojanovic and Vojin G. Oklobdzija, "Comparative analysis of master-slave latches and flip-flops for high-performance and lowpower systems," IEEE J. Solid-State Circuits, vol. 34, no. 4, April. 1999, pp. 536-548.

25

Leakage Power Reduction in Flip-Flops by Using ... - IEEE Xplore

Leakage Power Reduction in Flip Flops by Using. MTCMOS and ULP Switch. Saihua Lin. Department of Electronic Engineering. Tsinghua University. Beijing ...

2MB Sizes 1 Downloads 260 Views

Recommend Documents

Quantifying the Information Leakage in Timing Side ... - IEEE Xplore
Abstract—When multiple job processes are served by a single scheduler, the queueing delays of one process are often affected by the others, resulting in a ...

Low-power design - IEEE Xplore
tors, combine microcontroller architectures with some high- performance analog circuits, and are routinely produced in tens of millions per year with a power ...

A Peak Power Efficient Cooperative Diversity Using Star ... - IEEE Xplore
Abstract—In this paper, we propose a new simple relaying strategy with bit-interleaved convolutionally coded star quadra- ture amplitude modulation (QAM).

Joint ICI and Noise Reduction in OFDM Using a New ... - IEEE Xplore
transmitter and the receiver or Doppler spread. Carrier frequency offset causes intercarrier interference (ICI) and ICI degrades the system performance and ...

Modelling of Wave Propagation in Wire Media Using ... - IEEE Xplore
Abstract—The finite-difference time-domain (FDTD) method is applied for modelling of wire media as artificial dielectrics. Both frequency dispersion and spatial ...

Minimizing power consumption in digital CMOS circuits - IEEE Xplore
scaling strategy, which uses parallelism and pipelining, to tradeoff silicon area and power reduction. Since energy is only consumed when capacitance is being ...

Cell Tracking in Video Microscopy Using Bipartite Graph ... - IEEE Xplore
Automated visual tracking of cells from video microscopy has many important biomedical applications. In this paper, we model the problem of cell tracking over pairs of video microscopy image frames as a minimum weight matching problem in bipartite gr

IEEE Photonics Technology - IEEE Xplore
Abstract—Due to the high beam divergence of standard laser diodes (LDs), these are not suitable for wavelength-selective feed- back without extra optical ...

wright layout - IEEE Xplore
tive specifications for voice over asynchronous transfer mode (VoATM) [2], voice over IP. (VoIP), and voice over frame relay (VoFR) [3]. Much has been written ...

Device Ensembles - IEEE Xplore
Dec 2, 2004 - time, the computer and consumer electronics indus- tries are defining ... tered on data synchronization between desktops and personal digital ...

wright layout - IEEE Xplore
ACCEPTED FROM OPEN CALL. INTRODUCTION. Two trends motivate this article: first, the growth of telecommunications industry interest in the implementation ...

Evolutionary Computation, IEEE Transactions on - IEEE Xplore
search strategy to a great number of habitats and prey distributions. We propose to synthesize a similar search strategy for the massively multimodal problems of ...

Localized Power Aware Broadcast Protocols with ... - IEEE Xplore
We consider broadcast protocols in wireless networks that have limited energy and computation resources. There exist localized power aware broadcast ...

Robust Power Allocation for Multicarrier Amplify-and ... - IEEE Xplore
Sep 11, 2013 - Abstract—It has been shown that adaptive power allocation can provide a substantial performance gain in wireless communication systems ...

Impact of Practical Models on Power Aware Broadcast ... - IEEE Xplore
The existing power aware broadcast protocols for wireless ad hoc and sensor networks assume the impractical model where two nodes can communicate if and only if they exist within their transmission radius. In this paper, we consider practical models

I iJl! - IEEE Xplore
Email: [email protected]. Abstract: A ... consumptions are 8.3mA and 1.lmA for WCDMA mode .... 8.3mA from a 1.5V supply under WCDMA mode and.

Underwater Optical Image Dehazing Using Guided ... - IEEE Xplore
Kyushu Institute of Technology, Kyutech. Kitakyushu, Japan ... Color change corresponds to the varying degrees of attenuation encountered by light traveling in ...

Failure Rate Modeling Using Equipment Inspection Data - IEEE Xplore
Page 1 ... of customizing failure rates using equipment inspection data. This ... that the incorporation of condition data leads to richer reliability models.

Improving Automatic Detection of Defects in Castings by ... - IEEE Xplore
internal defects in the casting. Index Terms—Castings, defects, image processing, wavelet transform, X-ray inspection. I. INTRODUCTION. NONDESTRUCTIVE ...

Gigabit DSL - IEEE Xplore
(DSL) technology based on MIMO transmission methods finds that symmetric data rates of more than 1 Gbps are achievable over four twisted pairs (category 3) ...

NEXT: In-Network Nonconvex Optimization - IEEE Xplore
Abstract—We study nonconvex distributed optimization in multiagent networks with time-varying (nonsymmetric) connec- tivity. We introduce the first algorithmic ...