1956

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 11, NOVEMBER 2008

Skewed Flip-Flop and Mixed-Vt Gates for Minimizing Leakage in Sequential Circuits Jun Seomun, Jae-Hyun Kim, and Youngsoo Shin, Senior Member, IEEE

Abstract—Mixed Vt has been widely used to control leakage without affecting circuit performance. However, existing approaches only target combinational circuits, even though sequential elements such as flip-flops contribute an appreciable proportion of the total leakage. Applying high Vt to ordinary flip-flops would reduce the number of combinational gates that can be assigned to high Vt , because any timing slacks would be absorbed by the increased setup guard time and propagation delay of the high-Vt flip-flops. A skewed flip-flop (SFF) can be constructed by replacing a subset of transistors in a conventional flip-flop with low-leakage devices, such as large-Lgate transistors. In terms of leakage and delay, SFFs exhibit very skewed characteristic, which depends on the transistors that are replaced. Our algorithm selectively substitutes SFFs for conventional flip-flops in sequential circuits so as to reduce the leakage while continuing to satisfy the timing constraint. When combined with the mixed-Vt combinational circuits, this achieves an average leakage saving of 15% compared to mixed Vt alone. The leakage of the flip-flops themselves is cut by 25% on average. Index Terms—Flip-flop, leakage current, low power, mixed Vt , sequential circuit.

I. I NTRODUCTION

T

HE SCALING down of transistor size has resulted in a dramatic increase in leakage current. MOSFET threshold voltages are commonly scaled down to compensate for reduced circuit performance at a low supply voltage, which leads to an exponential increase in subthreshold leakage. Gate oxide is also scaled down for better control of MOSFET channel current, which contributes to the growth of gate leakage. The overall leakage current has now become a major contributor to total power consumption. In 90-nm, 65-nm, and other recent nanometer CMOS technologies, it is not uncommon for leakage current to be responsible for almost half of the total power consumption [1]. Many circuit-level approaches have been proposed to control leakage in standby mode. Power gating [2]–[7] suppresses standby leakage by cutting off a circuit from its power supply. Manuscript received September 3, 2007; revised December 19, 2007 and April 3, 2008. Current version published October 22, 2008. This work was supported in part by Samsung Electronics and in part by the Brain Korea 21 Project, School of Information Technology, KAIST, Daejeon, Korea, in 2007. This paper was recommended by Associate Editor S. Vrudhula. J. Seomun and Y. Shin are with the Department of Electrical Engineering, KAIST, Daejeon 305-701, Korea (e-mail: [email protected]; [email protected]). J.-H. Kim is with the System LSI Division, Semiconductor Business, Samsung Electronics, Yongin 446-711, Korea (e-mail: jh6324.kim@samsung. com). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCAD.2008.2006084

Fig. 1. Distribution of leakage in combinational subcircuits and flip-flops of several sequential circuits. (a) Before and (b) after the application of mixed Vt .

Reverse body bias [8], [9] modulates the body bias so that the MOSFET threshold voltage is effectively increased, which then reduces standby leakage. The amount of leakage that can be saved during standby is substantial; however, due to the delay and energy required for transition between active mode and standby, its use is only justified during long idle periods [10]. Circuit-level techniques also involve substantial custom engineering for their implementation. For example, the changes required for power gating [11] include the sizing of a current switch, the design of retention flip-flops, and a custom power network. Moreover, circuit-level power saving involves significant customization of designs based on standard cells, because their implementation departs significantly from the standard design flow [12], [13]. In contrast, mixed-Vt , mixed-Tox , and mixed-Lgate techniques are transparent to designers, since they can be seamlessly integrated with most design tools. Also, these techniques save leakage in all modes of operation, not just in standby. However, much less leakage is saved than with power gating or reverse body bias. The mixed-Vt technique is particularly popular for suppressing subthreshold leakage. A mixed-Vt circuit [14] utilizes low-threshold-voltage (Vt ) gates on timing-critical paths and gates with high (and/or normal) Vt on paths which are not critical to timing. Many algorithms have been proposed for the deployment of mixed Vt [15]–[19], but they all target the combinational portion of a circuit, even though sequential elements such as flip-flops are responsible for an appreciable proportion of the total leakage. We assessed the importance of the leakage from flip-flops by taking several circuits from benchmarks, as well as from industrial designs, and simulating them with SPICE in a commercial 65-nm technology model. Fig. 1(a) shows that the flip-flops

0278-0070/$25.00 © 2008 IEEE

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:56 from IEEE Xplore. Restrictions apply.

SEOMUN et al.: SFF AND MIXED-Vt GATES FOR MINIMIZING LEAKAGE IN SEQUENTIAL CIRCUITS

1957

In this paper, we propose the concept of skewed flip-flops (SFFs). These flip-flops are constructed by substituting lowleakage devices, such as large-Lgate devices, for different combinations of transistors in a conventional flip-flop, which results in very unequal leakage (and timing) characteristics for different present- and next-state combinations. We will now go on to propose an algorithm that utilizes these SFFs to reduce the overall flip-flop leakage while maintaining the original cycle time of a circuit. Then, as done before, we will apply the conventional mixed-Vt design to the combinational portion of the circuit. We have measured the leakage of several benchmark circuits, as well as circuits extracted from industrial designs, comparing SFF implementation with the application of conventional mixed Vt to combinational subcircuits alone. The results show that we can reduce leakage by an additional 15% on average. The remainder of this paper is organized as follows. In Section II, we present mixed-Lgate devices, which are the basis of SFFs, and we then discuss the design of SFFs and their leakage and timing characteristics. An algorithm for combining SFFs with mixed-Vt allocation to minimize the leakage of sequential circuits is proposed in Section III. The experimental procedure and the results from several sequential circuits are presented in Section IV, and we draw conclusions in Section V. II. SFF S Fig. 2. (a) Difference in delay between high and low Vt for several different gates. (b) Ratio of average number of timing paths through the flip-flops and of combinational gates in several different circuits.

contribute, on average, 44% to the total leakage of these circuits, which is substantial. However, if mixed Vt is used in designing combinational subcircuits [17], then the proportion of leakage that occurs in the flip-flops rises to 63% on average and can get as high as 80%. This motivates us to apply high Vt to some flip-flops (to an extent that does not violate the delay constraint of the circuit) and then to apply conventional mixed Vt to combinational subcircuits. However, the proportion of combinational gates that can be assigned to high Vt can be expected to fall significantly because any timing slacks will have been absorbed by the increased setup guard time and propagation delay in the flip-flops that are assigned to high Vt . This is readily seen in Fig. 2(a), which shows the increased delay when a high-Vt gate is used instead of a low-Vt gate. Moreover, the number of flipflops that are able to take advantage of high Vt may not be very large, since assigning high Vt to a flip-flop typically affects more than one of the timing paths in a circuit, which is clearly shown in Fig. 2(b). The attempt to reduce power of sequential circuits has mainly focused on switching power of flip-flops, i.e., reducing clock swing of flip-flops [20], allocating states such that flip-flops have less number of state changes [21], [22], and so on. There has been an attempt to reduce leakage of flip-flops by using mixing high and low Vt in the same flip-flops [23], but how to exploit them in sequential circuits and how to combine them with designing low-leakage combinational circuits have not been investigated.

A. Mixed Lgate Because the application of high Vt to ordinary flip-flops can incur enough delay to eliminate any timing slacks in combinational subcircuits, we will selectively deploy low-leakage devices inside the flip-flops. There are various ways to do this, namely, high-Vt transistors, the stacking effect, transistors with a large Lgate , and so on. The selective use of high-Vt transistors can lead to a large increase of area due to the design rules involved in the use of HVT layer, such as minimum spacing between adjacent HVT layers, between the boundary of HVT layer and poly layer, and so on. The use of stacking effect also yields a large increase of area, since the increased delay from stacking effect has to be compensated through increasing transistor size. Therefore, we will only consider the use of larger Lgate transistors, although any kind of low-leakage device could conceptually be used to construct SFFs. The mixed-Lgate approach involves exploiting devices of multiple gate lengths (Lgate ’s) in the same gate. Using a 130-nm industrial process, it has been reported [24] that an 8-nm increase in gate length reduces leakage by 30% at the expense of a 5% increase in delay in an inverter of minimum size. This combination of a large drop in leakage and a small increase in delay occurs because the normal gate length of the technology is usually very close to the knee of the curve of leakage against gate length, which is, in turn, due to shortchannel effects. The magnitude of this effect grows in nanometer technologies: A 10% increase of gate length is reported to reduce leakage by a factor of three in 65-nm technology [25]. To see the effect of large Lgate on leakage as technology scales down, we simulated a minimum-size inverter in three predictable technology nodes [26]. Fig. 3 compares the leakage for

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:56 from IEEE Xplore. Restrictions apply.

1958

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 11, NOVEMBER 2008

Fig. 3. Leakage current per unit area for an inverter of normal and 10% larger than the normal gate length.

Fig. 5. Sample D flip-flop with an inverter and a tristate inverter implementation. TABLE I GROUPS OF TRANSISTORS THAT MAKE UP LEAKAGE SOURCES FOR A G IVEN D-I NPUT OR Q-O UTPUT

Fig. 4. Leakage current and delay for an inverter implemented in a commercial 65-nm technology.

an inverter of normal gate length and the leakage for an inverter of 10% larger than the normal gate length. Increasing gate length can increase gate leakage, but its effect in terms of total leakage is not significant as can be readily seen in the figure. Fig. 4 shows the leakage current and delay for an inverter implemented in a commercial 65-nm technology which we used for the experiments reported in this paper. We allocate a 5-nm increase in gate length to large-Lgate devices, which are used together with normal-Lgate devices in our SFFs. This level of increase in gate length does not affect printability during manufacture, and in most cases, pin compatibility can be maintained with the cells of normal-Lgate devices, which benefits postplacement optimization [24]. B. Design of an SFF Fig. 5 shows an example of a positive-edge-triggered D flip-flop. The leakage of this flip-flop is determined by its D-input and Q-output.1 We define groups of transistors that are turned off (thus becoming the leakage source) for a given input or output of the flip-flop. Table I shows these groups. The transistors of group D0 , for example, are turned off when the D-input is low (see Fig. 5). The transistors M2 , M3 , M8 , M9 , 1 We assume the clock input (CK) to be low when the circuit is idle, which is the period of interest as regards leakage. This is felt to be a reasonable assumption due to the widespread use of clock gating.

Fig. 6.

Layout of (a) an original and (b) an SFF of type SF00 .

M12 , M13 , M16 , and M17 are driven by the clock, so they do not belong to any groups. Depending on the pair of groups (one for the D-input and another for the Q-output) that we take for large-Lgate devices, we can design four different new flip-flops, which are all examples of SFFs. An SFF of type SF00 , for example, has largeLgate transistors belonging to groups D0 and Q0 . Since we are increasing the gate length of the transistors that are the source of leakage when both the D-input and Q-output are low, the leakage of an SF00 can be made very low for that input–output combination. For two other input–output combinations (D low and Q high or D high and Q low), its leakage also becomes lower than that of the original flip-flop. Even when both the D-input and the Q-output are high, the leakage is still reduced. The reason for this will be discussed shortly. The transistors that are driven by the clock need separate attention. The transistors that are turned on (M2 , M3 , M16 ,

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:56 from IEEE Xplore. Restrictions apply.

SEOMUN et al.: SFF AND MIXED-Vt GATES FOR MINIMIZING LEAKAGE IN SEQUENTIAL CIRCUITS

Fig. 7.

1959

Comparison of leakage between the four types of SFF and an original flip-flop. (a) SF00 . (b) SF01 . (c) SF10 . (d) SF11 .

and M17 ) when the circuit is idle are implemented in normal Lgate . The transistors M8 and M9 , two out of four transistors M7 , M8 , M9 , and M10 in tristate inverter, are turned off. Since the leakage through the tristate inverter is already small due to stacking effect, M8 and M9 are not candidates for large Lgate . The same is true of M12 and M13 in SF00 and SF11 , because the input and output of the tristate inverter G1 are different. However, in SF01 , both the input and output of G1 are low, and the leakage through M13 and M14 is small, since both are turned off. However, M12 is between Vdd and the output, which is low, with M11 turned on, and M12 is thus a source of leakage. The transistor M12 in an SF01 can therefore be replaced by a large-Lgate device, and so can the transistors in groups D0 and Q1 . Similarly, in an SF10 , we can replace M13 with a large-Lgate device, and so can the transistors in groups D1 and Q0 . Most flip-flops generate both phases of the clock signal internally through cascaded inverters, as shown in Fig. 5. We can therefore use large-Lgate devices for M24 and M25 , since they are turned off when the circuit is idle. Obviously, this affects the internal clock (clk) waveforms, which, in turn, impact the timing characteristics of the flip-flop, a topic which will be discussed in Section II-C. In order to test our SFFs, we used a commercial 65-nm technology model. For the conventional D flip-flop shown in Fig. 5, we used large-Lgate devices with a length of 65 nm compared to the 60-nm length of normal devices. The introduction of some large-Lgate devices leaves the layout of the flip-flop almost unchanged, since we are able to exploit some marginal area present in the original flip-flop, as shown in Fig. 6. Fig. 7 compares the leakage of an original flip-flop and the four types of SFF. As expected, SF00 exhibits the lowest leakage when both the D-input and the Q-output are low. Note that the leakage when both the D-input and the Q-output are high also drops, due to substitution of the large-Lgate devices for M24 and M25 , although this is inevitably the smallest reduction. The average leakage of the original flip-flop is about 8.1 nA. The figure for SF00 is 77% of that, and the equivalent figures for SF01 , SF10 , and SF11 are 79%, 76%, and 80%, respectively. Note that Fig. 7 shows the leakage currents for the idle state, and these are all lower than those in a conventional flipflop. However, the increased gate length of some transistors in the SFFs can lead to an increased switching current, which may outweigh the reduced leakage current in the active state. However, our experiments showed that the selective use of large-Lgate devices with a length of 65 nm yields almost the

TABLE II TIMING CHARACTERISTICS OF SFFS

same switching power (less than 1% increase) compared to circuits of normal flip-flops. C. Timing Characteristics of SFFs The timing parameters of SFFs, comprising the setup time Tsu and the clock-to-Q delay tc−q , are shown in Table II, together with those of a conventional flip-flop. Because of the way in which we select a subset of transistors for large-Lgate devices, the SFFs exhibit a very asymmetric timing behavior. Consider the SF00 type, for example: Its rising setup time increases by 1 ps, while its falling setup time decreases by about 2 ps; its rising clock-to-Q delay increases by about 4 ps, but its falling clock-to-Q delay increases by about 2 ps. Fig. 8 shows the waveforms that explain the timing characteristics of SF00 . Note that the timing of a flip-flop is measured with respect to its clock input (CK), and is affected by the late arrival of clk and clk, which are internally generated and thus lag behind CK (see Fig. 5). The rising D-input (see Figs. 5 and 8) is captured in the master latch using M4 , M5 , and M10 , which are all slower in an SF00 than in a conventional flip-flop, because we have increased their gate lengths. However, a D-input can be captured only after clk arrives at the gate input of M9 , and clk arrives later than it does in a conventional flip-flop (for the same rising CK), because the gate lengths of M24 and M25 have also been increased. Fig. 8(a) explains the increased rising  − Tsu ), although the increase in the setup time of 1 ps (Tsu delay from the D-input to clk (T1 − T1 ) is even longer. A falling D-input is not affected by large-Lgate devices, and clk is delayed due to M24 . That is why, as shown in Fig. 8(b), the falling setup time is reduced rather than increased. However, the rising and falling clock-to-Q delays of an SF00 increase by about 4 and 2 ps, respectively, which is understandable, since both the Q-output and clk waveforms (as well as clk) lag behind CK, as shown in Fig. 8(c) and (d). The timing parameters of the other types of SFF can be understood in a similar way. The average overall delay of an SF00 (Tsu +tc−q ) is 114.3 ps, which is 2.7 ps longer than that of a conventional flip-flop;

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:56 from IEEE Xplore. Restrictions apply.

1960

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 11, NOVEMBER 2008

Fig. 8. Comparison of an SFF of type SF00 and a conventional flip-flop. (a) Rising Tsu . (b) Falling Tsu . (c) Rising tc−q . (d) Falling tc−q .

significant, as shown in Fig. 9(b), but these devices are able to guarantee either the setup time or the clock-to-Q delay, unlike the SFFs, which affect both timing parameters. III. SFF A LLOCATION

Fig. 9. HSFFs. (a) Timing parameters. (b) Leakage current.

an SF10 has the largest increase of 3.7 ps. Since the delay of the minimum-size inverter (without load) is 12.5 ps, the timing overhead of the SFFs is not in general significant, which gives us flexibility when we come to substitute SFFs for ordinary flip-flops. D. Half-SFFs (HSFFs) The allocation of SFFs will be discussed in Section III. If this process only involved the conversion of original low-Vt flip-flops to SFFs (and vice versa), it would be overly abrupt in terms of leakage and timing. To achieve a smoother transition, we will define two types of intermediate devices, called HSFFs. HSF0 uses large-Lgate devices for transistors M15 , M18 , M19 , M20 , M21 , and M22 (see Fig. 5), while HSF1 has increased gate lengths for transistors M1 , M4 , M5 , and M6 . The inverters to generate the clock signals retain their normal gate lengths. It is easy to see that the introduction of an HSF0 will not affect the setup time; moreover, the clock-to-Q delay remains the same for an HSF1 , as shown in Fig. 9(a). Compared to an ordinary flip-flop, the overall increase in delay due to the use of an HSF0 is 2.2 ps on average: For an HSF1 , the increase is 3.7 ps. The reduction in leakage achieved by the HSFFs is not

Our approach is to use SFFs (together with HSFFs) to minimize the leakage of the sequential components of a circuit, such as flip-flops, and then to apply mixed Vt to the combinational gates. The leakage of the sequential and combinational components is mutually dependent, because the timing slack exploited to minimize the leakage of one component could alternatively be used to minimize the leakage of another. This suggests that we should try to solve the two problems, namely, SFF allocation and mixed-Vt allocation, at the same time, but an optimal solution to the full problem seems unlikely. Since the introduction of SFFs involves an increase in delay that is marginal, we first try to minimize the leakage of the sequential components using SFFs (and HSFFs), with the aim of satisfying the delay constraint, and then submit the combinational subcircuits to conventional mixed-Vt allocation. A. Design Process Overview The input to our design process is a netlist for a sequential circuit, which is obtained from conventional logic synthesis. Timing constraints are specified as a cycle time, an arrival time at each primary input, and a required arrival time at each primary output. We assume that all the gates including flip-flops are initially low-Vt devices and that there are no negative slacks left in the circuit (i.e., the circuit in its fastest form can satisfy the timing constraints). From the initial netlist, we first identify the timing paths that violate the timing constraint when the flip-flops at the start and end of the path, or at both, contain an SFF or an HSFF. We

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:56 from IEEE Xplore. Restrictions apply.

SEOMUN et al.: SFF AND MIXED-Vt GATES FOR MINIMIZING LEAKAGE IN SEQUENTIAL CIRCUITS

Fig. 10. Sample sequential circuit.

will give an example of this process using the sequential circuit shown in Fig. 10 and considering only SFFs for simplicity of explanation. Let us examine the timing path that starts at FF1 and ends at FF3 . From the allowable delay on this path, the cycle-time constraint, and the timing parameters of the SFFs that may be inserted at FF1 or FF3 or in both locations, we can readily identify combinations of SFFs that do not violate the timing constraint. Let us assume that the timing is satisfied when we use an SF00 for an FF1 and an SF01 for an FF3 . This situation is marked by a 1 in the corresponding entry in the table which constitutes Fig. 11(a). Entries of 0 indicate that the corresponding combination of SFFs violates the timing constraint. This table can be successively simplified by identifying inconsistent, inferior, and redundant entries. Fig. 11(b) shows that the path from FF1 to PO1 , which is one of the primary outputs, violates the timing constraint when an SF00 is used to replace FF1 , and so the corresponding row can be removed. We are also able to remove the row corresponding to the insertion of an SF01 at FF3 , because that would violate the timing of the path from FF3 to FF1 , whichever flip-flop we use for FF1 (even the original low-Vt device). Since the table in Fig. 11(c) is symmetric for some flip-flop pairs (e.g., FF1 and FF3 ), the removal of some columns and rows from Fig. 11(b) allows others to be removed, leading to the simplified table of Fig. 11(d). We then search for rows and columns that correspond to inferior reductions in leakage. For a flip-flop i, we know the probability that its D-input will be high, which we will call ρi , and ξi is the probability for its Q-output (see the Appendix). We can use these probabilities, together with our knowledge of the leakage of the SFF under consideration, from Fig. 7, to estimate the flip-flop leakage Li Li = (1 − ρi )(1 − ξi )l00 + (1 − ρi )ξi l01 + ρi (1 − ξi )l10 + ρi ξi l11

(1)

where ljk is the leakage when the input to the flip-flop is j and its output is k. To make it easier to explain Fig. 11(e), we will assume that the leakage increases monotonically as we deploy an SF00 , an SF01 , an SF10 , an SF11 , and the original low-Vt flip-flop in that order, independent of the state probabilities of the different flip-flops. The row corresponding to the use of an

1961

ordinary flip-flop for FF1 is inferior, in terms of leakage saved, to the row corresponding to SF10 , and so, the former can be removed. Note that the third row, corresponding to FF1 , is not removed, even though an SF10 saves less leakage than an SF01 , because this row contains combinations (an SF00 or the original low-Vt flip-flop inserted at FF3 ), which do not exist in the corresponding entries in the second row. The removal of further rows and columns can be understood similarly. Symmetry allows us to remove the columns corresponding to the rows that we have just eliminated, resulting in the table of Fig. 11(f). In Fig. 11(g), two sets of entries between FF1 and FF3 are marked. It can be readily seen that the entries in the dotted box include all the entries in the solid box, which implies that the former is redundant and can be removed. In the simplified table of Fig. 11(h), we have allocated SF01 to FF2 , since it is the only option. In the final situation shown in the table of Fig. 11(i), we only need to determine the type of SFFs (or HSFFs) to be used for FF1 , FF3 , FF4 , and FF5 . B. SFF Allocation Using the data in the simplified table, we employ a branchand-bound method to search for a set of SFFs (and HSFFs) that minimizes leakage while satisfying the timing constraint. The candidate SFFs from Fig. 11(i) are shown with their leakage in Fig. 12(a). Fig. 12(b) shows the search tree, and the number in each circle indicates the order of search. In node 4, we encounter conflict, because the allocation of SFFs to FF5 , FF4 , and FF1 at that point precludes any feasible candidate for FF3 [see Fig. 11(i)]. Node 7 contains the first value of cost (leakage) that we determine. This allows us to compute a lower bound at each node that we subsequently visit, and we can safely ignore a subtree if its bound is larger than the current cost. We compute a very simple bounding function which assumes that flip-flops which have not yet been visited will be replaced by SFFs with the lowest possible leakage. For example, at node 13, we compute the lower bound by assuming that FF5 has the leakage of an SF11 and that FF4 has that of the original low Vt flip-flop. We also assume the leakage of an SF01 at FF1 and that of an SF00 at FF3 . This yields a total leakage of 105, which is larger than the current cost of 97, which was computed at node 12. The priority of a flip-flop in the search tree is determined by the number of flip-flops that are dependent on it in the simplified table, and the minimum leakage it can have. In Fig. 11(i), FF5 is only dependent on FF3 , and FF4 is only dependent on FF1 ; however, FF3 and FF1 are each dependent on two other flip-flops. From Fig. 12(a), we see that the minimum leakage that FF5 can have is 20, while the minimum for FF4 is 29. This is why FF5 is at the top of the search tree. By ordering the flip-flops in this way, we can find a lower bound on the leakage earlier in the search process. Even though branch and bound is computationally intensive, the search space for most practical circuits is narrow. This is because the flip-flops in a simplified table such as Fig. 11(i) lie on the critical path or in paths with a slack close to zero, which is small in numbers, and there are mutual dependences. The SFFs for the other flip-flops can be determined from a straightforward consideration of leakage.

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:56 from IEEE Xplore. Restrictions apply.

1962

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 11, NOVEMBER 2008

Fig. 11. (a) Initial combinations of SFFs satisfying the timing constraint, (b) removing the inconsistent rows and columns, (c) removing the columns corresponding to the rows removed from (b), (d) reduced combinations of SFFs, (e) removing the rows and columns which correspond to higher leakages, (f) removing the columns corresponding to the rows removed from (e), (g) removing the redundant entries, (h) fixing the type of SFFs for FF2 , and (i) final combinations of SFFs for further search. The leakages of SFFs are assumed to increase in the order of SF00 , SF01 , SF10 , SF11 , and original flip-flop.

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:56 from IEEE Xplore. Restrictions apply.

SEOMUN et al.: SFF AND MIXED-Vt GATES FOR MINIMIZING LEAKAGE IN SEQUENTIAL CIRCUITS

1963

Fig. 12. Branch-and-bound process for SFF search. (a) Table of flip-flop leakages. (b) Branching tree.

IV. E XPERIMENTS AND A NALYSIS

TABLE III BENCHMARK CIRCUITS AND RUNTIME FOR SFF ALLOCATION

A. Results We performed experiments on a set of sequential circuits taken from the ISCAS and ITC benchmarks. We also included several circuits extracted from an audio codec, a cryptography core based on the Advanced Encryption Standard, and other open cores [27]. Benchmark circuits are summarized in Table III, where the number of gates in the combinational subcircuit and the number of flip-flops are shown. Each circuit was synthesized with SIS [28] and mapped into a gate library, which we based on a commercial 65-nm technology. The library consists of 124 cells: seven flipflops; five inverters and four buffers; NAND, NOR, AND, and OR gates with two, three, and four inputs, each in five different sizes; and eight kinds of AOI and OAI gates, each in three different sizes. Technology mapping was done using a weighted sum of area and delay as the cost function, and gate sizing was performed during the technology mapping process. We assumed that the (idle) input probability distribution is available (either specified by the designer or obtained by simulating the finite-state machine description). To represent a series of idle periods, each circuit was simulated for 10 000 cycles, and 100 points in time were randomly picked to represent a series of idle periods. The probabilities of the D-input and Q-output of each flip-flop were measured at those idle points for use in SFF allocation [see (2)], and the vectors for the primary inputs at the idle points were used with SPICE2 to derive the average leakage current that we report. Fig. 13 shows experimental results. For each circuit, the left bar corresponds to conventional method (mixed Vt on 2 Two circuits s15850 and s38584 are exceptions, which exceeded the capacity of the simulation. Their leakage was obtained by adding up the average leakage of each gate in the circuit.

combinational subcircuit), where we used a mixed-Vt allocation algorithm based on gate sensitivity [17], the extent to which the leakage and timing of a gate will change with a change in Vt . The right bar corresponds to leakage current, normalized to the total leakage of the conventional method, of a circuit obtained by our method (SFF allocation procedure to the technologymapped netlist, and then mixed-Vt to the combinational subcircuit using the heuristic algorithm presented in the last section, which we implemented in SIS [28]). The total leakage

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:56 from IEEE Xplore. Restrictions apply.

1964

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 11, NOVEMBER 2008

Fig. 13. Comparison of leakage current between the conventional mixed-Vt combinational circuit and our method. All numbers are normalized to the total leakage of the conventional mixed Vt .

Fig. 14. Distribution of flip-flops after SFF allocation, followed by the application of mixed Vt to the combinational subcircuits.

current is reduced by 15% on average (up to 21%). The leakage of the flip-flops themselves is cut by 25% on average, which is an understandable consequence of the distribution of flip-flops after SFF allocation, as shown in Fig. 14. Since the overall delay overhead of SFFs is not significant (less than the delay of an unloaded minimum-size inverter), as discussed in Section III-B, many flip-flops are converted to SFFs. Several benchmark circuits, such as s298 and s400, only require one HSFF to satisfy the original delay constraint. This is possible because, at one end of the critical path, the two rising and two falling setup times of the SFFs are less than those of the original flip-flops (see Table II); and at its other end, the timing of the critical path is maintained by an HSFF, which has no timing penalty in terms of either setup time or clock-to-Q delay. The flip-flops of circuit s15850 are all SFFs, which is also made possible by the reduced setup times of some SFFs, even though clock-to-Q delays are all increased by the SFFs. The leakage in the combinational logic remains largely unchanged, as shown in Fig. 13. For some benchmarks, we can see a decrease rather

than an increase in leakage, implying that some paths have more slack. This is partly attributable to the reduced setup times of some SFFs and partly to the heuristic nature of sensitivity used in the mixed-Vt algorithm [17]. The runtime of SFF allocation (on Intel Xeon 3.6-GHz processor with 2-GB main memory and Red Hat Linux 2.4) is shown in the last column of Table III. It is mainly determined by the branch-and-bound search, i.e., the size of branching tree (see Fig. 12), not by the number of flip-flops. This is why arbiter takes the largest amount of time, even though it has only 11 flipflops. However, the size of branching tree is very small for most circuits, as discussed in Section III-B. The variation of Vt is increasing as the dimension of device gets smaller. It is reported that the standard deviation of threshold voltage is about 6% of its mean at 180-nm technology but about 11% at 65-nm technology [29]. Since leakage current varies significantly with process variation, it is important to assess the leakage in statistical point of view [30], [31]. We took two netlists (one from our method and the other from the conventional method) for each of the ten small example circuits and simulated them with SPICE in the Monte Carlo method to obtain leakage distribution under within-die (WID) process variations. This was done at slow process corner (SS), which was also base process corner for the experiments done in this paper. The ratios of the mean and standard deviation of the two distributions are shown in the third and fourth columns of Table IV, respectively. The ratio of the leakage reported in Fig. 13, which corresponds to the case where we do not assume WID variation (thus deterministic), is shown in the second column for comparison. Note that the ratios (second and third columns) are very close, even though the corresponding leakage currents themselves are different from each other. Furthermore, in all circuits, standard deviation is smaller in our method, i.e., our method yields circuits of less variability of leakage. This can be understood from the fact that we use less amounts of normal-Lgate low-Vt devices in flip-flops, thus having less leaky devices, which contribute to leakage variation. Fig. 15

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:56 from IEEE Xplore. Restrictions apply.

SEOMUN et al.: SFF AND MIXED-Vt GATES FOR MINIMIZING LEAKAGE IN SEQUENTIAL CIRCUITS

TABLE IV LEAKAGE COMPARISON WITH PROCESS VARIATION. THE NUMBERS IN COLUMN MEAN DENOTES THE MEAN OF LEAKAGE DISTRIBUTION IN THE NETLIST FROM OUR METHOD, DIVIDED BY THE CORRESPONDING MEAN IN THE N ETLIST F ROM THE C ONVENTIONAL M ETHOD . T HE N UMBERS IN COLUMN STD (STANDARD DEVIATION) IS DEFINED SIMILARLY

1965

a manufacturing point of view. Note that a histogram that skews too far to the left implies many equally critical paths, which eventually harms the yield due to many uncertainty factors [32] during the design process. C. Mixed-Vt Flip-Flop and Mixed-Vt Gates Instead of using SFFs, we may use mixed-Vt flip-flops (i.e., low- and high-Vt flip-flops), together with mixed-Vt combinational subcircuits. We implemented a simple algorithm which allocates mixed Vt to flip-flops in an arbitrary order (to the extent that the timing constraint can be satisfied). The result is shown in Fig. 17, where we plot the leakage normalized to the leakage of the mixed-Vt combinational logic. The algorithm allocates mixed Vt in the combinational subcircuits, followed by the flip-flops, and then in reverse order. We use the order that yields less leakage. It can readily be seen that the use of SFFs reduces leakage more than the use of mixed-Vt flip-flops in most circuits. Those circuits (such as s400) that benefit more from mixed-Vt flip-flops have large inherent slacks, which cannot be fully exploited by SFFs. D. Statistical Allocation of Mixed-Vt Gates

Fig. 15. Histogram of leakage after Monte Carlo simulation of irda_mas_reg.

shows the histogram of leakage after Monte Carlo simulation of irda_mas_reg. We performed Monte Carlo simulation using the same netlists, but this time in nominal (NN) and fast (FF) process corners, which are process corners where we have more leakage current. The ratios of mean and standard deviation in these two other process corners are shown in the last four columns of Table IV. Comparing these to the corresponding ratios of SS (third and fourth columns), we can observe that we save more leakage, and we have lesser variability as we have in more leaky process corners, which is also due to the less amount of normal-Lgate low-Vt devices in flip-flops of the netlist from our method.

Due to the increasing WID variation, allocation of mixed Vt at one process corner can be very pessimistic, because all devices are assumed to follow the same process parameters, which is very unlikely to happen. The statistical allocation of mixed Vt can be performed while satisfying the timing constraint, which is a target for a high percentile point (e.g., +3σ) of the probability density functions of signal arrival times rather than for a deterministic critical path delay. We implemented a statistical static timing analysis engine [33], which is used in a routine of statistical allocation of mixed Vt [34], which we also implemented. The allocation routine was then used to obtain the circuits of mixed Vt in the combinational subcircuit, and those from SFF allocation, followed by mixed Vt in the combinational subcircuit (our approach). The result is shown graphically in Fig. 18. If we allocate mixed Vt statistically, more gates tend to be assigned in high Vt [34], thus leakage from combinational subcircuits decreases, which can be readily seen if we compare Table III and Fig. 18. V. C ONCLUSION

B. Effect of SFF Allocation on Slack Distribution In Fig. 16, we compare the cumulative slack in three different implementations of the s298, b03, and irda_fir_det benchmarks: all low Vt and mixed Vt applied to the combinational subcircuits in a conventional manner, and our approach. (For example, about 80% of nets have slacks of less than 181 ps in the alllow-Vt implementation of s298.). We see the clear shift of histogram to the left from an all-low-Vt to an all-mixed-Vt implementation, which implies an increasing number of critical paths. However, the shift from mixed Vt (the conventional approach) to our approach is marginal, which is desirable from

Although it is in widespread use, the value of the current approach to mixed Vt is limited, since it considers only the combinational portion of a circuit, even though the sequential elements contribute a nonnegligible, and sometimes a significant, portion of the total leakage. We have proposed SFFs that exhibit very unequal leakage and timing characteristics. This concept is general, and any kind of conventional flip-flop can be transformed into an SFF. We have presented a heuristic that substitutes SFFs for conventional flip-flops. An average saving of 15% of leakage was observed when this approach was compared to the use of mixed Vt alone.

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:56 from IEEE Xplore. Restrictions apply.

1966

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 11, NOVEMBER 2008

Fig. 16. Cumulative slack histogram of (a) s298, (b) b03, and (c) irda_fir_det in various implementations.

Fig. 19. Fig. 17. Application of mixed Vt to both flip-flops and combinational subcircuits compared to our approach.

Fig. 18. Experimental results with statistical allocation of mixed Vt in combinational subcircuits.

A PPENDIX C OMPUTATION OF I DLE -S TATE P ROBABILITIES Fig. 19 shows a general form of a (Mealy-type) sequential circuit. We exploit the idle-state probabilities of flip-flops, i.e., the state probabilities when the circuit is in idle mode, when we substitute SFFs for ordinary ones. Generally, the state probability of a flip-flop is the probability of its state being at logic high. For a D flip-flop, the probabilities of D-input and Q-output

Sequential circuit.

are the same and equal to the state probability. However, if we only consider a sequence of idle intervals, which interleave with active intervals, the probabilities of D-input and Q-output are different. This is particularly true when flip-flops are disabled or the clock is gated when circuits are idle, which is a common practice in sequential circuit design. The idle state probabilities of D-input and Q-output can be derived as follows. We will suppose that the design starts from a state transition graph, which is a common starting point for designing finite-state machine controllers, and we will also assume that the (idle) input probability distribution is available (either specified by the designer or obtained by simulating the finite-state machine description). The probability of the present state (Qi = Q0,i , Q1,i , . . . , Qk−1,i , where k denotes the number of flip-flops) can be obtained from the eigenvector of the transition matrix corresponding to the unit eigenvalue [21]. More specifically, the vector of the present-state probability, denoted by v, can be computed [21] by vT P = vT ns 

Pi = 1

(2) (3)

i=1

where P is the conditional transition probability matrix whose elements pi,j are the conditional transition probabilities between states i and j, which is the sum of probabilities of the idle inputs that will trigger a transition from i to j. Pi is the steadystate probability, and thus a component of v, and ns indicates

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:56 from IEEE Xplore. Restrictions apply.

SEOMUN et al.: SFF AND MIXED-Vt GATES FOR MINIMIZING LEAKAGE IN SEQUENTIAL CIRCUITS

the number of states. The probability of the Q-output of a flipflop i that is high, which we will denote by ξi , is given by ξi =

ns 

Pj Qi,j .

(4)

j=1

The present-state probabilities, together with the input probabilities (see Fig. 19), are propagated [35] through the combinational subcircuit to yield the probability distribution of the next states of the inputs D(ρi ). If we have a structural description of a design, we can simulate the circuit with a sequence of (idle) input patterns, monitor the next and present states, and derive their probabilities. ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for their constructive comments and suggestions. R EFERENCES [1] J. Friedrich, B. McCredie, N. James, B. Huott, B. Curran, E. Fluhr, E. C. G. Mittal, D. P. Y. Chan, S. Chu, H. Le, L. Clark, J. Ripley, S. Taylor, J. Dilullo, and M. Lanzerotti, “Design of the Power6 microprocessor,” in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 2007, pp. 96–97. [2] S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu, and J. Yamada, “1-V power supply high-speed digital circuit technology with multithreshold-voltage CMOS,” IEEE J. Solid-State Circuits, vol. 30, no. 8, pp. 847–854, Aug. 1995. [3] T. Inukai, M. Takamiya, K. Nose, H. Kawaguchi, T. Hiramoto, and T. Sakurai, “Boosted gate MOS (BGMOS): Device/circuit cooperation scheme to achieve leakage-free giga-scale integration,” in Proc. Custom Integr. Circuits Conf., May 2000, pp. 409–412. [4] H. Kawaguchi, K. Nose, and T. Sakurai, “A super cut-off CMOS (SCCMOS) scheme for 0.5-V supply voltage with picoampere standby current,” IEEE J. Solid-State Circuits, vol. 35, no. 10, pp. 1498–1501, Oct. 2000. [5] Y. Shin, S. Heo, H.-O. Kim, and J. Choi, “Simultaneous control of subthreshold and gate leakage current in nanometer-scale CMOS circuits,” in Proc. Asia South Pacific Des. Autom. Conf., Jan. 2007, pp. 654–659. [6] K. Usami, N. Kawabe, M. Koizumi, K. Seta, and T. Furusawa, “Automated selective multi-threshold design for ultra-low standby applications,” in Proc. Int. Symp. Low Power Electron. Des., Aug. 2002, pp. 202–206. [7] K.-S. Min, H. Kawaguchi, and T. Sakurai, “Zigzag super cut-off CMOS (ZSCCMOS) block activation with self-adaptive voltage level controller: An alternative to clock-gating scheme in leakage dominant era,” in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 2003, pp. 400–502. [8] T. Kuroda, T. Fujita, S. Mita, T. Nagamatsu, S. Yoshioka, K. Suzuki, F. Sano, M. Norishima, M. Murota, M. Kako, M. Kinugawa, M. Kakumu, and T. Sakurai, “A 0.9-V, 150-MHz, 10-mW, 4 mm2 , 2-D discrete cosine transform core processor with variable threshold-voltage (VT) scheme,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1770–1779, Nov. 1996. [9] L. T. Clark, M. Morrow, and W. Brown, “Reverse-body bias and supply collapse for low effective standby power,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 9, pp. 947–956, Sep. 2004. [10] D. Duarte, Y.-F. Tsai, N. Vijaykrishnan, and M. J. Irwin, “Evaluating runtime techniques for leakage power reduction,” in Proc. Int. Conf. VLSI Des., Jan. 2002, pp. 31–38. [11] H.-O. Kim, Y. Shin, H. Kim, and I. Eo, “Physical design methodology of power gating circuits for standard-cell-based design,” in Proc. Des. Autom. Conf., Jul. 2006, pp. 109–112. [12] N. Ohkubo and K. Usami, “Delay modeling and static timing analysis for MTCMOS circuits,” in Proc. Asia South Pacific Des. Autom. Conf., Jan. 2006, pp. 570–575. [13] B. Choi and Y. Shin, “Lookup table-based adaptive body biasing of multiple macros,” in Proc. Int. Symp. Quality Electron. Des., Mar. 2007, pp. 533–538. [14] L. Wei, Z. Chen, M. Johnson, K. Roy, and V. De, “Design and optimization of low voltage high performance dual threshold CMOS circuits,” in Proc. Des. Autom. Conf., Jun. 1998, pp. 489–494.

1967

[15] Q. Wang and S. Vrudhula, “Static power optimization of deep submicron CMOS circuits for dual VT technology,” in Proc. Int. Conf. Comput.Aided Des., Nov. 1998, pp. 490–496. [16] V. Sundararajan and K. K. Parhi, “Low power synthesis of dual threshold voltage CMOS VLSI circuits,” in Proc. Int. Symp. Low Power Electron. Des., Aug. 1999, pp. 139–144. [17] T. Karnik, Y. Ye, J. Tschanz, L. Wei, S. Burns, V. Govindarajulu, V. De, and S. Borkar, “Total power optimization by simultaneous dual-Vt allocation and device sizing in high performance microprocessors,” in Proc. Des. Autom. Conf., Jun. 2002, pp. 486–491. [18] M. Ketkar and S. S. Sapatnekar, “Standby power optimization via transistor sizing and dual threshold voltage assignment,” in Proc. Int. Conf. Comput.-Aided Des., Nov. 2002, pp. 375–378. [19] D. Lee and D. Blaauw, “Static leakage reduction through simultaneous threshold voltage and state assignment,” in Proc. Des. Autom. Conf., Jun. 2003, pp. 191–194. [20] H. Kawaguchi and T. Sakurai, “A reduced clock-swing flip-flop (RCSFF) for 63% power reduction,” IEEE J. Solid-State Circuits, vol. 33, no. 5, pp. 807–811, May 1998. [21] L. Benini and G. D. Micheli, “State assignment for low power dissipation,” IEEE J. Solid-State Circuits, vol. 30, no. 3, pp. 258–268, Mar. 1995. [22] S. Chattopadhyay and P. N. Reddy, “Finite state machine state assignment targeting low power consumption,” Proc. Inst. Elect. Eng.—Comput. Digit. Tech., vol. 151, no. 1, pp. 61–70, Jan. 2004. [23] Q. Wang and S. B. D. Vrudhular, “An investigation of power delay trade-offs for dual Vt CMOS circuits,” in Proc. Int. Conf. Comput. Des., Oct. 1999, pp. 556–562. [24] P. Gupta, A. B. Kahng, P. Sharma, and D. Sylvester, “Gate-length biasing for runtime-leakage control,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 25, no. 8, pp. 1475–1485, Aug. 2006. [25] S. Rusu, S. Tam, H. Muljono, D. Ayers, J. Chang, B. Cherkauer, J. Stinson, J. Benoit, R. Varada, J. Leung, R. D. Limaye, and S. Vora, “A 65-nm dual-core multithreaded Xeon processor with 16-MB L3 cache,” IEEE J. Solid-State Circuits, vol. 42, no. 1, pp. 17–25, Jan. 2007. [26] Y. Cao, T. Sato, D. Sylvester, M. Orshansky, and C. Hu, “New paradigm of predictive MOSFET and interconnect modeling for early circuit simulation,” in Proc. Custom Integr. Circuits Conf., May 2000, pp. 201–204. [27] Opencores. [Online]. Available: http://www.opencores.org/ [28] E. M. Sentovich, K. J. Singh, L. Lavagno, C. Moon, R. Murgai, A. Sldanha, H. Savoj, P. R. Stephan, R. K. Brayton, and A. S. Vincentelli, “SIS: A system for sequential circuit synthesis,” EECS Dept., Univ. California, Berkeley, CA, Tech. Rep. UCB/ERL M92/41, May 1992. [29] C. Chiang and J. Kawa, Eds., Design for Manufacturability and Yield for Nano-Scale CMOS. New York: Springer-Verlag, 2007. [30] H. Chang and S. Sapatnekar, “Full-chip analysis of leakage power under process variations, including spatial correlations,” in Proc. Des. Autom. Conf., Jun. 2005, pp. 523–528. [31] S. Bhardwaj and S. Vrudhula, “A fast and accurate approach for full chip leakage analysis of nano-scale circuits considering intra-die correlations,” in Proc. Int. Conf. VLSI Des., Jan. 2007, pp. 589–594. [32] X. Bai, C. Visweswariah, P. N. Strenski, and D. J. Hathaway, “Uncertainty-aware circuit optimization,” in Proc. Des. Autom. Conf., Jun. 2002, pp. 58–63. [33] J.-J. Liou, K.-T. Cheng, S. Kundu, and A. Krstic, “Fast statistical timing analysis by probabilistic event propagation,” in Proc. Des. Autom. Conf., Jun. 2001, pp. 661–666. [34] A. Srivastava, D. Sylvester, and D. Blaauw, “Statistical optimization of leakage power considering process variations using dual-Vth and sizing,” in Proc. Des. Autom. Conf., Jun. 2004, pp. 773–779. [35] S. Ercolani, M. Favalli, M. Damiani, P. Olivo, and B. Riccó, “Estimate of signal probability in combinational logic networks,” in Proc. Eur. Test Conf., Apr. 1989, pp. 132–138.

Jun Seomun received the B.S. and M.S. degrees in electrical engineering from KAIST, Daejeon, Korea, in 2005 and 2007, respectively, where he is currently working toward the Ph.D. degree in the Department of Electrical Engineering, KAIST. His current research interests include leakage power optimization for VLSI circuits.

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:56 from IEEE Xplore. Restrictions apply.

1968

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 11, NOVEMBER 2008

Jae-Hyun Kim received the B.S. degree in electrical engineering from Yonsei University, Seoul, Korea, in 2006 and the M.S. degree in electrical engineering from KAIST, Daejeon, Korea, in 2008. He is currently with the System LSI Division, Semiconductor Business, Samsung Electronics, Yongin, Korea. His research interests include VLSI design methodology and computer-aided design for low-power integrated circuits.

Youngsoo Shin (M’00–SM’05) received the B.S., M.S., and Ph.D. degrees in electronics engineering from Seoul National University, Seoul, Korea. From 2000 to 2001, he was with the University of Tokyo, Tokyo, Japan, as a Research Associate. From 2001 to 2004, he was with the IBM T. J. Watson Research Center, Yorktown Heights, NY, as a Research Staff Member. He has been with the Department of Electrical Engineering, KAIST, Daejeon, Korea, since 2004, where he is currently an Associate Professor. His research interests include computer-aided design with emphasis on low-power, statistical, and high-level and logic-level designs and design tools. Dr. Shin received the Best Paper Award at the 2005 ISQED and was nominated for the Best Paper Award at the same conference in 2007. He has been a member of the technical program committee and organizing committee of several technical conferences, including ICCAD, ISLPED, ASP-DAC, CASES, and ISCAS. He is a member of the Low-Power Technical Committee of the ACM Special Interest Group on Design Automation.

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:56 from IEEE Xplore. Restrictions apply.

Skewed Flip-Flop and Mixed-Vt Gates for Minimizing ...

... power saving involves significant customization of designs based on standard cells, ... and then to apply conventional mixed Vt to combinational subcircuits.

2MB Sizes 1 Downloads 121 Views

Recommend Documents

Skewed Flip-Flop Transformation for Minimizing Leakage in ...
low voltage high performance dual threshold CMOS circuits,” in Proc. Design. Automation Conf., June 1998, pp. 489-494. [3] M. Ketkar and S. S. Sapatnekar, ...

Skewed Flip-Flop Transformation for Minimizing Leakage in ...
ABSTRACT. Mixed Vt has been widely used to control leakage without affect- ing circuit performance. However, current approaches target the combinational circuits even though sequential elements, such as flip-flops, contribute an appreciable proportio

Cooperative Caching Strategies for Minimizing ...
objects caching strategies for minimizing content provisioning costs in networks with homogeneous and ... wireless networks to improve data access efficiency.

Minimizing Strain and Maximizing Learning
Active coping is defined as the “attempt .... problems that extend beyond a narrowly defined role. ..... proactive group (two SDs above the mėan), which had an.

Minimizing Strain and Maximizing Learning
Part of this article was presented at the 13th Annual Conference of the Society for .... the “active job” because much of the energy aroused by the job's many ...

Online Load Balancing for MapReduce with Skewed ...
strategy is a constrained version of online minimum makespan and, in the ... server clusters, offering a highly flexible, scalable, and fault tolerant solution for ...

Deterministic algorithms for skewed matrix products
Figure 1 A high-level pseudocode description of the algorithm. In ComputeSummary we iterate over the n outer products and to each one of them apply Lemma 1 such that only the b heaviest entries remain. We update the summary with the entries output by

Melbourne Fencing And Gates .pdf
super tough fence. 12.​ Steel is, however, vulnerable to corrosion, especially from salt water. Galvanization. (application of a thin layer of zinc) and powder coating will alleviate this problem, but at the. same time will increase the cost. 13.â€

Minimizing Movement
Many more variations arise from changing the desired property of the final ..... Call vertices vk,v3k+1,v5k+2,...,v(2k+1)(r1−1)+k center vertices. Thus we have r1 ...

Minimizing Movement
has applications to map labeling [DMM+97, JBQZ04, SW01, JQQ+03], where the .... We later show in Section 2.2 how to convert this approximation algorithm, ...

Summarizing and Mining Skewed Data Streams
email streams [40], aggregating sensor data [39], analyzing .... The correlation is sufficiently good that not only ..... For z ≤ 1, the best results follow from analysis.

Skewed Wealth Distributions: Theory and Empirics - Department of ...
F. S. Fitzgerald: The rich are different from you and me. ... properties of distributions of wealth from the mechanics of accumulation with stochastic ..... tail on r and γ also turns out to be a robust implication of this class of models; see the .

Skewed Wealth Distributions: Theory and Empirics - NYU Economics
fy(y) = fs (g-1(y)) ds dy . For instance, if the map g is exponential, y = egs, and if fs is an exponential distribution, fs(s) = pe-ps, the distribution of y is fy(y) = pe-p 1 gln y 1 .... ln γ+ln q ≥ 1. 14. 2.1.2 Thickness of the distribution of

Effectiveness of Continuity Diaphragm for Skewed ...
Feb 13, 2007 - ing, bridge skew angle, span length, and diaphragm type. As either the ..... The PCI National Bridge Conference (NBC) is ... Call for Papers.

Skewed Wealth Distributions: Theory and Empirics - NYU Economics
Abstract. Invariably across a cross-section of countries and time periods, wealth distribu- tions are skewed to the right displaying thick upper tails, that is, large and slowly declining top wealth shares. In this survey we categorize the theoretica

Skewed Wealth Distributions: Theory and Empirics
top income across countries. A related literature investigates whether consumption is less unequal than income or wealth. Recent studies however show that consumption inequality closely tracks earnings inequality. See Aguiar and Bils (2011) and Attan

Text Extraction and Segmentation from Multi- skewed Business Card ...
Department of Computer Science & Engineering,. Jadavpur University, Kolkata ... segmentation techniques for camera captured business card images. At first ...

Melbourne Fencing And Gates .pdf
steel fencing are normally welded together rather than connected with screws and you've got a. super tough fence. 12.​ Steel is, however, vulnerable to corrosion, especially from salt water. Galvanization. (application of a thin layer of zinc) and

Skewed Wealth Distributions - Department of Economics - NYU
In all places and all times, the distribution of income remains the same. Nei- ther institutional change ... constant of social sciences.2. The distribution, which now takes his name, is characterized by the cumulative dis- tribution function. F (x)=

Wavelet Synopsis for Data Streams: Minimizing ... - Semantic Scholar
Aug 24, 2005 - cients are restricted to be wavelet coefficients of the data ..... values of the largest numbers in R are bounded if some πi's are very small.

Minimizing the Communication Cost for Continuous ...
[email protected] ... where a server continuously maintains the skyline of dy- ... republish, to post on servers or to redistribute to lists, requires prior specific.

Modified Heuristic Algorithm for Minimizing the Target Coverage Area ...
Recent advances in micro-electro-mechanical systems, digital electronics, and wireless communications have led to .... researches done in maximizing coverage of WSN by sensors positioning. .... [12] L. Gu and J. Stankovic, “Radio triggered wake-up