Supply Switching With Ground Collapse: Simultaneous ...

Viewer
Transcript

758

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 7, JULY 2007

Supply Switching With Ground Collapse: Simultaneous Control of Subthreshold and Gate Leakage Current in Nanometer-Scale CMOS Circuits Youngsoo Shin, Senior Member, IEEE, Sewan Heo, Hyung-Ock Kim, Student Member, IEEE, and Jung Yun Choi, Member, IEEE

Abstract—Power gating has been widely used to reduce subthreshold leakage. However, the efficiency of power gating degrades very fast with technology scaling, which we demonstrate by experiment. This is due to the gate leakage of circuits specific to power gating, such as storage elements and output interface circuits with a data-retention capability. A new scheme called supply switching with ground collapse is proposed to control both gate and subthreshold leakage in nanometer-scale CMOS circuits. Compared to power gating, the leakage is cut by a factor of 6.3 with 65-nm and 8.6 with 45-nm technology. Various issues in implementing the proposed scheme using standard-cell elements are addressed, from register transfer level to layout. These include the choice of standby supply voltage with circuits that support it, a power network architecture for designs based on standard-cell elements, a current switch design methodology, several circuit elements specific to the proposed scheme, and the design flow that encompasses all the components. The proposed design flow is demonstrated on a commercial design with 90-nm technology, and the leakage saving by a factor of 32 is observed with 3% and 6% of increase in area and wirelength, respectively. Index Terms—Leakage, low-power, power gating, semicustom, standard cell.

I. INTRODUCTION UBTHRESHOLD leakage current grows exponentially with every process generation, due to the scaling down of the threshold voltage. Reducing subthreshold leakage has, therefore, been conceived as a key to achieving low standby power. Many circuit-level approaches have been proposed to control leakage, especially when circuits are in standby. Lowering the supply voltage during standby mode [1] was proposed as a way of reducing both components of power: the voltage itself and the leakage current. It was shown that both subthreshold and gate leakage rapidly diminish with the supply voltage. However, its application to large-scale design such as VLSI has not been explored. Reverse body bias [2], [3] modulates the body bias so that the MOSFET threshold voltage is effectively increased, which

S

Manuscript received June 19, 2006; revised December 27, 2006. A preliminary version of this paper was presented at the 12th Asia and South Pacific Design Automation Conference, Yokohama, Japan, January 23–26, 2007. This work was supported by Samsung Electronics. Y. Shin and H.-O. Kim are with the Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 305701, Korea (e-mail: [email protected]). S. Heo is with the Electronics and Telecommunications Research Institute (ETRI), Daejeon 305-700, Korea. J. Y. Choi is with the Samsung Electronics, Yongin, Gyeonggi-Do 449-711, Korea. Digital Object Identifier 10.1109/TVLSI.2007.899228

reduces standby leakage. This can be implemented by raising rails, while the nMOS body is tied to constant potential, but to the point where losing state can be avoided [3]. In the approach and rails of virtual power/ground rail clamp [4], both get collapsed toward each other due to the built-in diode clamps. However, the leakage current of combinational logic cannot be suppressed in both techniques. Power gating [5]–[7] suppresses standby leakage by cutting off a circuit from its power supply. Selective MTCMOS [8] and zigzag power gating [9] achieve fast transition to and from standby mode, which is appropriate for real-time applications. Power gating is especially popular and has been widely used in the semiconductor industry [10]–[13]. Although it is efficient in controlling subthreshold leakage, it suffers from a gate oxide direct tunneling current (gate leakage, for brevity). Furthermore, gate leakage grows very fast with CMOS technology scaling, even faster than subthreshold leakage, due to the scaling down of the gate oxide thickness. In fact, for CMOS technology of 90 nm and below, gate leakage is comparable to or exceeds subthreshold leakage [14]. In this paper, we show that the efficiency of power gating does indeed degrade very fast with technology scaling and also with temperature. The reduction in efficiency is due to the gate leakage of circuits specifically associated with power gating, such as storage elements and output interface circuits with a data-retention capability and current switches. In order to overcome the efficiency limitation of power gating circuits, we propose a new circuit technique, which we call supply switching with ground collapse (SSGC). This reduces standby gate leakage by switching to a lower-voltage supply, while current switches, through which ground collapses in standby mode, suppress subthreshold leakage. We address various issues in the design of SSGC: choice of a low-voltage standby supply and design of supply switching circuits, design of power network and current switches, and the design of SSGC-specific circuits. We also discuss the design flow for SSGC using standard-cell elements, starting from the register transfer level (RTL) description of a circuit and going down to a layout. A range of experiments show that, compared to power gating, SSGC reduces leakage by a factor of 5 to 7 at 65 nm and 6 to 11 at 45 nm. The proposed design flow is demonstrated on a commercial design, showing the validity of the proposed method. The remainder of this paper is organized as follows. In Section II, we will study how the efficiency of power gating circuits varies with technology scaling. We propose SSGC in Section III and deal with various aspects of SSGC design using

1063-8210/$25.00 © 2007 IEEE Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:58 from IEEE Xplore. Restrictions apply.

SHIN et al.: SUPPLY SWITCHING WITH GROUND COLLAPSE

759

Fig. 1. Power gating circuit.

standard-cell elements. Experimental results are presented in Section IV, and the application of SSGC to a commercial design is studied in Section V. We draw conclusions in Section VI. II. EFFECT OF TECHNOLOGY SCALING ON EFFICIENCY OF POWER GATING Power gating [15] is realized by placing a current switch, called a footer, in series with a logic block, as shown in Fig. 1. A header, and logic block, can which is a pMOS switch placed between also be used. Another alternative is to use both header and footer. When the power management unit (PMU) detects a sufficiently long period of idle time, it turns off the footer to disconnect the logic block from the power rail . When it subsequently detects that the logic block is required, the PMU turns on the footer again so that the logic block is reconnected to the power rails. The rail in Fig. 1, between the logic block and the footer, denoted by serves as a virtual power rail for the logic block, which usually to sustain its performance. employs a low threshold voltage The footer, however, can have either a low or a high . The use of a high is called MTCMOS power gating [5]. Since storage elements such as flip-flops and latches lose their states in standby mode, alternative elements, which are capable of state retention, must be used. They are sometimes called state-retention storage elements. There are several variants [10], [12], [16], but most of these elements require all three power , and ), since part of the storage element must rails ( be connected to and in order to retain its state, while the and remaining parts are power gated and thus connected to . The outputs also need careful design. During a transition from . active to standby mode, the outputs tend to rise gradually to This would lead to a large short-circuit current in the blocks that are connected to the outputs, as well as logical errors in the outputs themselves. A circuit, labeled the output-holding circuit in Fig. 1, needs to be inserted at each output to preserve the logic during standby mode [16], [12]. Power gating offers a substantial saving in subthreshold leakage. A reduction of three orders of magnitude is commonly observed in 180-nm technology. However, the presence of state-retention storage elements, output-holding circuits, and footers causes gate leakage, and this cannot be eliminated by power gating. This holds true for power gating circuits with header and both header and footer. In order to understand how the efficiency of power gating varies with the technology, and with temperature, we will look at two ISCAS benchmark circuits: s344 and s1269. In both of

Fig. 2. (a) State-retention flip-flop and (b) output-holding circuit.

these circuits, we replace each flip-flop with the state-retention flip-flop [10] shown in Fig. 2(a), and insert the output-holding circuit [12] shown in Fig. 2(b) at each primary output. Footers are sized based on the average current [17], assuming that a 10% increase in the delay can be tolerated. The final circuit is simulated with SPICE, and the total leakage current is then compared to that of the original circuit. The experiment was repeated for 180-, 65-, and 45-nm predictive technologies [18] and also for 90-nm commercial technology, while varying the temperature between 40 C and 120 C. Fig. 3 shows the results. As expected, power gating is very efficient when subthreshold leakage dominates the standby current (e.g., 180-nm technology at high temperature). But its efficiency quickly degrades with decreasing temperature, because of the declining benefit of using a high (in the state-retention flip-flops, output-holding circuits, and footers) to suppress subthreshold leakage. This is because itself decreases with temperature and the difference in leakage between devices of low and high becomes less marked, since subthreshold leakage has an exponential dependency on . More importantly, the reduction in leakage achieved by power gating is much less significant in 90-, 65-, and 45-nm technologies. For these technologies, we find that the gate leakage of state-retention storage elements and output-holding circuits dominates the total leakage current, and these effects are not eliminated by power gating. Fig. 4 shows the proportion of gate leakage (in room temperature) of state-retention storage elements and output-holding circuits of the example circuits. It clearly shows that the gate leakage is the dominant component of total leakage in these technologies. In order to see the impact of power gating in a physical layout, we realized layouts for the circuits shown in Fig. 2, and combined them with a commercial 180-nm gate library. We performed automatic placement and routing [19] of s1269. Compared to the original circuit, the area increased by 43% and the wirelength increased by 33%. These substantial increases

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:58 from IEEE Xplore. Restrictions apply.

760

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 7, JULY 2007

Fig. 3. Efficiency of power gating circuits: (a) s344 and (b) s1269.

Fig. 4. Proportion of gate leakage in power gated circuits.

supply control switches and the footer is turned on. When the PMU detects1 that the circuit is in standby state, it steers the supply control switches so that the standby supply voltage is applied to the circuit. At the same time, the footer is turned off and subthreshold leakage from the combinational logic is eliminated. Note that some part of the storage elements are connected to the footer, while the remaining parts bypass the footer . This allows us to use convenand are directly connected to tional storage elements with only slight modification, while we maintain the states in standby state. This solves the two main problems of conventional power gating: gate leakage and the overhead of the state-retention storage elements. is considerably lower than , significantly The voltage reducing the standby gate leakage since gate leakage is propor[20]. The standby voltage should be chosen so tional to that the potential that drives the logic block (the virtual supply in Fig. 5) is higher than the minimum voltage necvoltage essary for the storage elements to retain their states, plus some margin to guarantee state retention in the presence of noise. A. Choice of the Standby Supply Voltage and Design of Supply Switching Circuits

Fig. 5. Supply switching with ground collapse.

in area and wirelength are mainly due to the state-retention flip-flops and output-holding circuits (see Fig. 2, B1, B2, and Standby indicate the signals coming from PMU). III. SUPPLY SWITCHING WITH GROUND COLLAPSE As shown in Section II, the efficiency of power gating degrades with technology scaling due to the presence of gate leakage in state-retention storage elements and output holding circuits. Using SSGC, the component of gate leakage in storage elements is reduced by dropping the supply voltage, while power gating largely eliminates subthreshold leakage in the combinational circuits. Fig. 5 shows the SSGC concept. When the circuit is in acis applied through tive mode, the normal supply voltage

The key ingredient of SSGC is in standby mode, which should be as low as possible to reduce gate leakage but, at the same time, high enough to maintain the states of the storage elements. Temperature and process variation can affect the integrity of states at the reduced standby voltage, and need to (note be taken into account when we determine standby is only used to choose , and itself that standby is not constant). For example, we used SPICE to simulate an implementation in commercial 90 nm of the flip-flop shown in for temperatures beFig. 6, and determined the lowest tween 40 C and 125 C, which we assumed to be a realistic operating range. We repeated the experiment for different process corners to allow for process variation: each plot in Fig. 6 corresponds to a different process corner. Fig. 6 shows that at 1There are many alternative power management interfaces and the full range of possibilities is beyond the scope of this paper. For example, a circuit may have internal logic that detects its own standby state and sends a standby request to the PMU. The PMU may then acknowledge the request depending on the configuration of the whole system. The same logic can be used to detect the wakeup condition and to interface with the PMU to achieve a return to active mode.

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:58 from IEEE Xplore. Restrictions apply.

SHIN et al.: SUPPLY SWITCHING WITH GROUND COLLAPSE

761

Fig. 8. Correlation of V

Fig. 6. Low supply voltage for state retention.

Fig. 7. Supply switching circuits with M2 implemented in (a) nMOS and (b) pMOS device.

least 260 mV is required to power this flip-flop reliably in the presence of temperature and process variations. If a design has several types of storage element, these experiments need to be repeated for each type, and the maximum necessary voltage can . then be assumed as The supply switching circuits in Fig. 5 can be designed as two MOS switches, as shown in Fig. 7. The normal supply voltage is supplied through M1, which is a pMOS switch with a high threshold voltage. Using a device with a high threshold can reduce the subthreshold leakage of M1, which is turned off in is lower than due to standby mode. In active mode, the voltage drop across M1, which increases the circuit delay. This implies that the sizing of M1 is important for circuit performance. The wake-up delay, which is the delay in switching from standby to active mode (i.e., the time needed to turn off M2 and turn on M1), is also dependent on the size of M1. A low threshold voltage will usefully reduce the size of M2. This may increase the subthreshold leakage, but that has little impact since M2 turns off in active mode, while most of the leakage current in the circuit occurs during standby mode. The polarity (nMOS versus pMOS) and the size of M2 are important

and total leakage current.

determinants of the total leakage current, since they affect the , which in turn determines the gate leakage. Note choice of has to be higher than the minimum voltage needed for that state retention, since there is a voltage drop across M2 in standby varies with temperature, once mode. In other words, is determined and fixed. In fact, it is the lowest at the maximum temperature, because M2 needs to supply large subthreshold as well as gate leakage current of the circuit, thus has the largest is the highest and usually drop across its drain and source. at the minimum temperature due to small leakage close to current of the circuit. Our experiments show that the use of nMOS switch generally reduces the leakage current, and is therefore preferable for M2. At most temperatures, the total leakage current in the circuit with an nMOS switch is less than half of that with a pMOS switch. This can be understood from the observation that is higher for pMOS transistor at the maximum temperature (reis determined at the maximum temperature), which call that at lower temperatures, making the leads to higher values of total leakage greater than it would be with an nMOS switch. The size of the nMOS switch M2 should be chosen to minimize the total leakage, so long as the area overhead from the switch can be tolerated. Simulating a circuit to determine the total leakage while varying the size of M2 can take a significant amount of time. Fortunately, it can be shown through experiment that the total leakage approximately correlates with (e.g., see Fig. 8, which corresponds to the industrial example used in Section V). Sizing can, therefore, be performed more conveniently by simulating M2 switch alone. We first fix the drain (or source) of M2 switch to the minimum voltage that is constant). supports state retention (i.e., we assume that We also assume that M2 switch needs to supply the average leakage current of the circuit, which is obtained through simulation (assuming slow corner to take process variation into account). Then, we obtain the voltage of the source (or drain) of , while we change the nMOS size. This M2 switch, which is gives us the plot similar to Fig. 8. B. Semicustom Design Methodology In this subsection, we will discuss various issues that arise in applying SSGC to semicustom designs based on standard-cell elements.

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:58 from IEEE Xplore. Restrictions apply.

762

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 7, JULY 2007

Fig. 9. Power networks for supply switching with ground collapse.

Fig. 10. (a) Conceptual layout of a footer cell and (b) the layout of a footer with slices and isolators.

1) Power Networks: Fig. 5 shows that we need additional and , as well as conventional netpower networks for works for and . To meet this demand, we propose the new power network topology shown in Fig. 9. These networks consist of four power rings and corresponding power rails. The netand are connected to chip-level power works providing networks, while the and networks are local. Note also and rails connect, respectively, to the and that the terminals of the cells implementing combinational logic, allowing unmodified conventional standard-cell logic elements to be used. The locations of the footer and the M1 switch are important, since they affect the circuit operation in active mode. In Fig. 9, for example, they are located in the four quadrants of the placement region. Accurate analysis of the power network may be required, depending on the power delivery requirements (current, IR drop, electromigration, etc.) that need to be imposed. M1 from a vertical rail that resides in a higher metal receives through its terminal. Similarly, layer, and connects to through its terminal, while conthe footer connects to via a horizontal rail in a higher metal layer. The necting to placement of M2 is less important, since it only supplies standby . In Fig. 9, M2 switches are located in four corners voltage of the placement region. and , as shown in Fig. 5. Storage elements need Because we modify the conventional storage elements slightly, as well. to reduce their subthreshold leakage, they require (The detail is explained in Section III-B3) As a result, the and terminals connect to the and elements’ rails, while the connection to is made through their signal and pins. The output-holding circuit receives both through its signal pins. 2) Footer Design: Fig. 10(a) shows a conceptual cell layout of a footer switch. Its source and drain terminals are connected and , respectively, while its terminal merely to serves as a connecting medium for the cells on its left- and right-hand sides. The body biasing of logic cells is implicit (the body of pMOS and the body of nMOS to ), since we do not modify to any standard cell layout. However, the body of a footer can be biased either to its source or to its drain. This allows us a tradeoff between area overhead and leakage saving. If the body

), the footer can of a footer is connected to its drain (i.e., share its body with logic gates, which makes the layout more approaches compact. However, when a footer is turned off, , resulting in a p-n junction current in the footer, which is a disadvantage in terms of standby mode leakage. Conversely, ), the leakage sitif the body is connected to its source (i.e., uation improves, although there is an area overhead due to the need for well isolation. We chose this second option to minimize leakage. Note that SSGC requires triple-well CMOS technology. However, this can be avoided if the body of logic nMOS (in addition to that of footer) is biased to . This can be accomplished by removing all the substrate contacts and by using tap cells to provide body bias [3]. This, of course, comes at a cost of reduced performance due to body effect. To cope with the area overhead due to well isolation, we build a footer by combining two types of cells, which we call a slice and an isolator. A slice is a unit footer; when slices are abutted together, they constitute a larger footer. Isolators are placed at both ends of the slices so that there is guaranteed to be enough room between the footer and the logic cells for well isolation. Fig. 10(b) shows a footer constructed by abutting three slices with two isolators. The footer needs to be placed in an isolated p-well, which in turn needs to be inside an n-well for well isolation. The extra spaces required are denoted by A, B, C, and D. Once the size (width) of a footer has been determined [17] from a given performance requirement, we know the number of slices that need to be placed. In terms of a simple tally of area, the best way to place slices is to abut them all together, since this requires only two isolators. But a single large footer can block placement of the logic cells. Furthermore, the power network (i.e., ) may experience a large IR drop if the logic cells are physically distant from the footer. Instead we place discrete footers in a regular pattern, as shown in Fig. 9. 3) SSGC-Specific Cells: While power gating requires stateretention storage elements, conventional storage elements can be used in SSGC, since they are not totally power gated and their standby leakage current is controlled by reducing the supply . For example of the testable flip-flop shown in voltage through serially Fig. 11(a), only a slave latch is connected to connected nMOS switches, while the remainder of the circuit is ). Fig. 11(a) also shows a power gated (i.e., connected to layout of the flip-flop. This choice slightly increases the area of

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:58 from IEEE Xplore. Restrictions apply.

SHIN et al.: SUPPLY SWITCHING WITH GROUND COLLAPSE

763

Fig. 11. SSGC-specific cells: (a) flip-flop and (b) output-holding circuit.

, can now be determined from the leakage M2, together with of the circuit at maximum temperature. An output-holding circuit is inserted at each primary output. We resynthesize the modified netlist to take the extra delay from the output-holding circuits and flip-flops into account. In the physical design stage, we first generate the convenand tional power and ground networks, which serve as networks in SSGC. Combining these with the extra netand , we now have our power networks, as works for shown in Fig. 9. The slice blocks for a footer, and the M1 and M2 switches, are placed in a regular fashion and then fixed in their locations. After the placement of the logic cells, the signal routing and the routing of the standby signal can be performed. The transistor-level netlist is extracted from the layout and simulated with SPICE to estimate the leakage current. Fig. 12. Design flow.

IV. EXPERIMENTAL RESULTS the storage elements but does not affect the wirelength, since these elements are not controlled by the PMU. Like power gating, SSGC needs an output-holding circuit, since the outputs of a circuit are driven by reduced voltage in standby mode, while the blocks that are connected to the out(either because they are in active puts may be driven by mode or because they do not employ SSGC). Fig. 11(b) [12] shows an example of output-holding circuit and its layout. 4) Design Flow: The design flow for SSGC is shown in Fig. 12, where SSGC-specific steps are highlighted. The RTL design goes through a traditional logic synthesis to create the gate-level netlist. In order to determine the size of M1 and the footer, we first apply random logic patterns to the inputs of the netlist, simulate it with a circuit simulator, which gives us the average current. Combining this result with the target delay penalty2 and the turn-on resistance of a minimum-size MOS transistor gives us the size of M1 and the footer [17]. The size of 2We assume that the footer and M1 contribute equally to the delay increase in the circuit. In other words, both footer and M1 take equal responsibility for the delay penalty.

We performed experiments on a set of circuits taken from the ISCAS’89 and ITC’99 benchmarks. In Table I, the second and the three subsequent columns show the characteristics of the original circuits. The remaining columns show the leakage savings achieved by power gating and SSGC (as factors), for implementation in 65- and 45-nm predictive technologies, all at room temperature. For 65 nm, the low and high of nMOS device are 220 and 389 mV, respectively; those of pMOS device are 220 and 424 mV, respectively. For 45 nm, nMOS device have 220 and 457 mV for their low and high , respectively; 220 and 486 mV for pMOS. To implement power gating, we used state-retention flip-flops and output-holding circuits, as shown in Fig. 2. The sizing of the footers was based on the average current, assuming that a 10% increase in delay can be tolerated. For the SSGC implementation, we assumed that the same 10% delay penalty for sizing M1 and a footer would be allowed, and we used the circuits shown in Fig. 11 for flip-flops and outputs. The leakage saved by SSGC increases with technology scaling, since the gate leakage now takes a higher proportion

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:58 from IEEE Xplore. Restrictions apply.

764

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 7, JULY 2007

Fig. 13. Leakage breakdown of s3384: (a) original circuit, (b) after power gating, and (c) after SSGC.

of the total leakage current. The leakage is cut by a factor of 51 with 65-nm technology and a factor of 56 with 45-nm technology, on average. Conversely, and for the same reason, the saving from power gating decreases as the technology is scaled down. On average, compared to power gating, the leakage is cut by a factor of 6.3 with 65 nm and 8.6 with 45–nm technology. These results demonstrate the efficiency of SSGC as technology scales down, and power gating becomes less effective. The ability of SSGC to save leakage is determined by the number of storage elements and outputs in the original circuit, since they are the main sources of leakage current in standby mode (compare b03 and b14, for example). Fig. 13 clearly shows the effect of applying power gating and SSGC to s3384, respectively. After power gating, subthreshold leakage is eliminated and gate leakage of flip-flops now becomes a dominant factor of total leakage as shown in Fig. 13(b). The transistors that are connected to the inputs, output-holding circuits, and current switches are responsible for the rest of the gate leakage. On the other hand, both subthreshold and gate leakage are substantially reduced after SSGC as shown in Fig. 13(c). We repeated the experiments at different temperatures to explore the temperature dependency of both techniques. We have already seen how the efficiency of power gating varies with temperature in Section II (i.e., leakage saving decreases as temperature goes down). For SSGC, the leakage saving improves with decreasing temperature, since the subthreshold leakage gets smaller as the gate leakage becomes a higher proportion of the total. It should be noted that the minimum voltage necessary for to derive total leakage reduction factor state retention for SSGC was obtained from only one process corner due to its availability [18]. Thus, if we take process corners into account (refer to Fig. 6) as well as margin for various noises, in practice, can be much higher than that used for our experimental reon the total sult in Table I. In order to understand the effect of leakage reduction factor of SSGC, we will look at two benchmark circuits: s344 and s1269. Fig. 14 shows that even though by 200 mV from the theoretical minimum, the we increase leakage saving factor with SSGC do not degrade significantly.

TABLE I TOTAL LEAKAGE REDUCTION FACTOR. TEMPERATURE

= 25

C

Fig. 14. Total leakage reduction factor of SSGC with 65-nm technology over V .

V. CASE STUDY: EMBEDDED TRACE MACROCELL In order to validate SSGC, we used an embedded trace macrocell (ETM) [21] as a test vehicle. An ETM provides debug and trace facilities for ARM processors, and allows information about the processor’s state to be captured both before and after a specific event. The original design used in this experiment consists of 90-K gates after mapping on to a commercial 90-nm, 1.0-V gate library. The design has 124 outputs and the same number of output-holding circuits are required for an SSGC implementation. There are 5.5-K storage elements, made up of four types of flip-flops and a single type of the design, of latch. To determine the standby voltage we first repeated a process similar to the one shown in Fig. 6 for each storage element. The highest of the voltages needed for state-retention was 284 mV, which was then used to choose

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:58 from IEEE Xplore. Restrictions apply.

SHIN et al.: SUPPLY SWITCHING WITH GROUND COLLAPSE

765

PMU to control storage elements. In contrast, the state-retention storage elements used with power gating need control signals [B1 and B2 in Fig. 2(a)], which significantly increase the total wirelength. VI. CONCLUSION

Fig. 15. Final layout of ETM with SSGC.

the size of M2 (17.6 m) and (about 310 mV)3 as shown in Fig. 8. The design then follows the flow in Fig. 12. Physical design was done in the flat, with floorplan constraints imposed on two large subblocks, as shown in Fig. 15. The leakage of the original design was estimated by summing up the average leakage of each gate. To estimate the leakage of the design after SSGC implementation, we first characterized the average leakage of SSGC flip-flops, output-holding circuit, and current switches. We then summed up the average leakage of those elements, because they are the only sources of leakage in SSGC. SSGC reduces the leakage current by a factor of 32 at 25 C (13 A compared to 410 A with the original design). The saving goes up at reduced temperatures, as we would expect: rising to a factor of about 130 at 40 C. This result is in accordance with the experimental results presented in the previous section, implying that we could expect more savings in 65-nm and smaller technologies, where gate leakage takes a higher proportion of the total standby leakage current. We analyze the contribution to leakage current by different components of the design. The storage elements (SEs) draw 12.8 A and thus are responsible for most of the leakage current. This is understandable since their subthreshold leakage (see Fig. 11), alcan be significant due to the use of a low in though leakage is reduced by a lower supply voltage standby mode. Footers and output-holding circuits take a negtransistors. ligible leakage current due to their use of high However, it should be noted that the relative contribution of different elements will be different depending on the proportion of gate leakage in the total leakage current. We have also analyzed the overhead of using SSGC with this layout, in terms of area and wirelength. The area (sum of all cell areas) increases by 3% compared to that of the original circuit, which is almost negligible. The total wirelength only increased by 6%. It should be noted that this is a significant advantage of SSGC over power gating [19] in addition to its ability to suppress more leakage current. The reason for this small increase of area and wirelength is because SSGC does not use the 3Note that the actual various noises.

V

has to be higher than this to allow the margin for

Although power gating has been widely used to reduce subthreshold leakage, we have shown that its efficiency degrades very fast with technology scaling, limiting its application to nanometer-scale technologies, such as 65 and 45 nm. We have demonstrated by simulations that this is due to the gate leakage of circuits specific to power gating, such as state-retention storage elements and output-holding circuits. In order to overcome this limitation of power gating, we have proposed a new circuit technique called supply switching with ground collapse. We performed a range of experiments to compare the leakage with SSGC and with power gating. SSGC outperforms power gating by a factor of 5 to 7 at 65 nm and 6 to 11 at 45 nm. We have presented the design flow for applying SSGC to a semi-custom design using standard-cell elements, and we have demonstrated its feasibility on a commercial design using commercial 90-nm technology. ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for their constructive comments and suggestions. REFERENCES [1] B. H. Calhoun and A. P. Chandrakasan, “Standby power reduction using dynamic voltage scaling and canary flip-flop structures,” IEEE J. Solid-State Circuits, vol. 39, no. 9, pp. 1504–1511, Sep. 2004. [2] T. Kuroda, T. Fujita, S. Mita, T. Nagamatsu, S. Yoshioka, K. Suzuki, F. Sano, M. Norishima, M. Murota, M. Kako, M. KinugawB, M. Kakumu, and T. Sakurai, “A 0.9-V, 150-MHz, 10-mW, 4 mm , 2-D discrete cosine transform core processor with variable threshold-voltage (V ) scheme,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1770–1779, Nov. 1996. [3] L. T. Clark, M. Morrow, and W. Brown, “Reverse-body bias and supply collapse for low effective standby power,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 9, pp. 947–956, Sep. 2004. [4] K. Kumagai, H. Iwaki, H. Yoshida, H. Suzuki, T. Yamada, and S. Kurosawa, “A novel powering-down scheme for low V CMOS circuits,” in Proc. Symp. VLSI Circuits, 1998, pp. 44–45. [5] S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu, and J. Yamada, “A 1-V power supply high-speed digital circuit technology with multithreshold-voltage CMOS,” IEEE J. Solid-State Circuits, vol. 30, no. 8, pp. 847–854, Aug. 1995. [6] T. Inukai, M. Takamiya, K. Nose, H. Kawaguchi, T. Hiramoto, and T. Sakurai, “Boosted gate MOS (BGMOS): Device/circuit cooperation scheme to achieve leakage-free giga-scale integration,” in Proc. Custom Integr. Circuits Conf., 2000, pp. 409–412. [7] H. Kawaguchi, K. Nose, and T. Sakurai, “A super cut-off CMOS (SCCMOS) scheme for 0.5-V supply voltage with picoampere current,” IEEE J. Solid-State Circuits, vol. 35, no. 10, pp. 1498–1501–, Oct. 2000. [8] K. Usami, N. Kawabe, M. Koizumi, K. Seta, and T. Furusawa, “Automated selective multi-threshold design for ultra-low standby applications,” in Proc. Int. Symp. Low-Power Electron. Design, 2002, pp. 202–206. [9] K.-S. Min, H. Kawaguchi, and T. Sakurai, “Zigzag super cut-off CMOS (ZSCCMOS) block activation with self-adaptive voltage level controller: An alternative to clock-gating scheme in leakage dominant era,” in Proc. IEEE Int. Solid-State Circuits Conf., 2003, pp. 400–401. [10] S. Shigematsu, S. Mutoh, Y. Matsuya, Y. Tanabe, and J. Yamada, “A 1-V high-speed MTCMOS circuit scheme for power-down application circuits,” IEEE J. Solid-State Circuits, vol. 32, no. 6, pp. 861–869, Jun. 1997.

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:58 from IEEE Xplore. Restrictions apply.

766

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 7, JULY 2007

[11] S. V. Kosonocky, M. Immediato, P. Cottrell, and T. Hook, “Enhanced multi-threshold (MTCMOS) circuits using variable well bias,” in Proc. Int. Symp. Low-Power Electron. Design, 2001, pp. 165–169. [12] H.-S. Won, K.-S. Kim, K.-O. Jeong, K.-T. Park, K.-M. Choi, and J.-T. Kong, “An MTCMOS design methodology and its application to mobile computing,” in Proc. Int. Symp. Low-Power Electron. Design, 2003, pp. 110–115. [13] P. Royannez, H. Mair, F. Dahan, M. Wagner, M. Streeter, L. Boue-tel, J. Blasquez, H. Clasen, G. Semino, J. Dong, D. Scott, B. Pitts, C. Raibaut, and U. Ko, “90 nm low leakage SoC design techniques for wireless applications,” in Proc. IEEE Int. Solid-State Circuits Conf., 2006, pp. 138–139. [14] N. Sirisantana and K. Roy, “Low-power design using multiple channel lengths and oxide thicknesses,” IEEE Design Test Comput., vol. 21, no. 1, pp. 56–63, Jan. 2004. [15] S. G. Marendra and A. Chandrakasan, Eds., Leakage in Nanometer CMOS Technologies. New York: Springer, 2005. [16] J. Kao and A. Chandrakasan, “MTCMOS sequential circuits,” in Proc. Eur. Solid-State Circuits Conf., 2001, pp. 317–320. [17] S. Mutoh, S. Shigematsu, Y. Gotoh, and S. Konaka, “Design method of MTCMOS power switch for low-voltage high-speed LSIs,” in Proc. Asia South Pacific Design Autom. Conf., 1999, pp. 113–116. [18] Y. Cao, T. Sato, D. Sylvester, M. Orshansky, and C. Hu, “New paradigm of predictive MOSFET and interconnect modeling for early circuit design,” in Proc. Custom Integr. Circuits Conf., 2000, pp. 201–204. [19] H.-O. Kim, Y. Shin, H. Kim, and I. Eo, “Physical design methodology of power gating circuits for standard-cell-based design,” in Proc. Design Autom. Conf., 2006, pp. 109–112. [20] R. K. Krishnamurthy, A. Alvandpour, V. De, and S. Borkar, “Highperformance and low-power challenges for sub-70 nm microprocessor circuits,” in Proc. Custom Integr. Circuits Conf., 2002, pp. 125–128. [21] ARM, Cambridge, U.K., “Embedded trace macrocell,” [Online]. Available: http://www.arm.com/products/solutions/ETM.html Youngsoo Shin (SM’05) received the B.S., M.S., and Ph.D. degrees in electronics engineering from Seoul National University, Seoul, Korea. He is currently an Associate Professor in the Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea. He has worked at the University of Tokyo, Tokyo, Japan, as a Research Associate, and IBM T. J. Watson Research Center, Yorktown Heights, NY, as a Research Staff Member. Dr. Shin was a recipient of a Best Paper Award at the 2005 International Symposium on Quality Electronic Design. He has been on the program committee for the International Symposium on Low-Power Electronics and Design, the International Conference on Computer-Aided Design, and the Asia and South Pacific Design Automation Conference.

Sewan Heo received the B.S. and the M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, in 2005 and 2007, respectively. He is currently working for array processor design at the Electronics and Telecommunications Research Institute (ETRI), Daejeon, Korea. His research interests include circuit design and optimization techniques for low-power VLSI systems.

Hyung-Ock Kim (S’03) received the B.S. degree in electronic engineering from Yonsei University, Korea, in 2002, and the M.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Deajeon, Korea, in 2004, where he is currently pursuing the Ph.D. degree in electrical engineering. His current research interests include physical design and leakage power optimization for VLSI circuits.

Jung Yun Choi (M’06) received the B.E. degree in electronics from the Kyungpook National University, Daegu, Korea, in 1997, and the M.S. and Ph.D. degrees in electronic and electrical engineering from the Pohang University of Science and Technology, Pohang, Korea, in 1999 and 2003. Currently, he is a Senior Engineer at the CAE Team, System LSI Division, Samsung Electronics Co., Ltd., Gyeonggi-Do, Korea. His research interests include all aspects of the computer-aided design of integrated circuits, especially, low-power design methodologies, power integrity, and power analysis.

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on December 9, 2009 at 19:58 from IEEE Xplore. Restrictions apply.