IEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 53, NO. 6, JUNE 2006
1405
Local Clustering 3-D Stacked CMOS Technology for Interconnect Loading Reduction Xinnan Lin, Shengdong Zhang, Member, IEEE, Xusheng Wu, and Mansun Chan, Senior Member, IEEE
Abstract—A three-dimensional (3-D) stacked CMOS technology is developed to closely pack devices in a number of standard cells to form local clusters. Based on the 3-D stacked CMOS technology, an analysis to extend the technology to implement standard cell-based integrated circuits is performed. It is found that the 3-D stacked CMOS technology can reduce the size of an overall IC by 50% with significant reduction in interconnect delay. A thermal analysis is also performed. It was found that the rise in temperature in 3-D ICs could be lower than that of traditional planar ICs under the condition of same propagation delay since the required power supply voltage of 3-D ICs to achieve the same performance is lower. Index Terms—Clustering technique, CMOS, three-dimensional integration.
I. INTRODUCTION
W
ITH the continuous scaling of active devices in CMOS technology, interconnect RC delay has become the major limitation in integrated circuit performance [1], [2]. Three-dimensional integrated circuits (3-D ICs) with multiple active layers stacked in the vertical direction have been proposed as a method to reduce the impact of interconnect loading by providing a shorter connection path between transistors through vertical wires [3]. However, the approach to study 3-D ICs has been based on the arbitrary placement of circuit elements in various layers and to be connected in an arbitrary manner. Such an approach results in a number of issues including: 1) the necessity for a special computer-aided design (CAD) that understands the specific 3-D technology needed to perform the vertical routing; 2) low vertical routing effectiveness due to a nonstandard interconnect; and 3) the difficulty of a larger temperature rise on the upper layer during operation. Some methods, such as through metal-induced lateral crystallization (MILC) [4] and wafer bonding [5], are developed to increase the circuit density and routing effectiveness. However, the vertical interconnect methodology is still a big problem. In this paper, we tried a different approach toward 3-D integration. We used the local clustering technique such that only standard cells in the design library are designed with Manuscript received August 18, 2005; revised December 16, 2005. This work was supported by the Research Grant Council of Hong Kong under Grant HKUST6289/04E and the NSFC under Project 9040701. The review of this paper was arranged by Editor C. Y. Lu. X. Lin, X. Wu, and M. Chan are with the Department of Electrical and Electronic Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong (e-mail:
[email protected]). S. Zhang is with the Institute of Microelectronics, Peking University, Beijing 100871, China (e-mail:
[email protected]). Digital Object Identifier 10.1109/TED.2006.874157
Fig. 1. Schematic drawing of the 3-D stacked CMOS inverter that contains of a double-gate PMOS on top of a single-gate NMOS. (a) Starting wafer and (b) cross-section of the stack 3-D transistors using the local clustering concept. Comparing the layout of the (c) conventional 2-D inverter and (d) 3-D inverter, the 3-D IC can achieve over 50% area reduction.
closely coupled 3-D stacked transistors. By making the cell size smaller, the total lateral interconnect length can be reduced. A technology to fabricate high-performance 3-D circuits has been developed. The performance of the technology when applied to circuit design has been studied by circuit simulation. The total chip area, power dissipation, critical path delay, and temperature increase have been compared with the same circuit designed with the conventional two-dimensional (2-D) approach. II. CLOSELY COUPLED 3-D STACKED CMOS TECHNOLOGY The development of the new technology for the 3-D local clustering approach needs to meet a number of objectives, including: 1) similar floor plan for PMOSFET and NMOSFET when stacking on top of each other; 2) short interconnect distance between the top and bottom devices in a standard cell; and 3) a close proximity between the upper-level device and the substrate. The schematic and layout of an inverter fabricated with our proposed technology is shown in Fig. 1. This process has two main features. First, it uses dummy top and bottom gates to confine the real gates in the self-aligned regions, just like the self-aligned DG process described in [6]
0018-9383/$20.00 © 2006 IEEE
1406
IEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 53, NO. 6, JUNE 2006
Fig. 3.
Fig. 2. Main process steps of the stack transistors’ local clustering using a 3-D inverter as an example. (a) Starting SOI wafer with oxidation and Nitride deposition. (b) Shallow trench isolation. (c) Nitride removal and LTO deposition. (d) LTO/Si/Oxide stack patterning, nitride deposition, and S/D implantation. (e) Via hole opening, poly-Si deposition, and CMP. (f) Poly-Si etching and exposed nitride removal in hot H3 PO4 . (g) Poly-Si deposition, CMP and S/D implantation. (h) LTO removal, active area definition bottom oxide removal, and gate oxidation. (i) Poly-Si deposition with in situ doping and patterning. (j) Passive layer deposition, contact hole opening, and metallization.
and [7]. Second, the bulk is further utilized as an active layer for making transistors that are very close to the upper DG MOSFET. The main fabrication steps are shown in Fig. 2. Starting from a silicon-on-insulator (SOI) wafer with a 200-nm buriedoxide film, the top silicon film is thinned to 60 nm by thermal oxidation and oxide etching. A modified shallow trench isolation (STI) process is used to define and isolate the active area. After 15 nm of the buffer oxide grown thermally, a 20-nm nitride is deposited as shown in Fig. 2(a). The wafer is then patterned, and a shallow trench of 400 nm is etched to define the active area. A 400-nm low temperature oxide (LTO) is deposited to fill the trench and is planarized by the chemical-mechanical polish (CMP) process with the nitride serving as the stop layer, which is shown in Fig. 2(b). The nitride is then removed and a 200-nm LTO is deposited as the dummy top gate as shown in Fig. 2(c). The LTO/silicon/oxide layer is patterned and etched using the gate mask and a 12-nm nitride is deposited. As+ implantation is used to form the source and drain of the bottom NMOS in Fig. 2(d). A via hole is opened at the drain side to connect the drain of NMOS to the drain of the top PMOS. The contact between the poly and underlying n+ diffusion is a tunneling ohmic contact. A 500-nm poly-silicon is then deposited and planarized by CMP as shown in Fig. 2(e). The nitride is used as the stop layer in the CMP process. The poly-silicon is then etched in TMAH solution to expose the nitride that covers the top single crystal
SEM picture of the 3-D stacked CMOS inverter.
silicon film. This is a critical step as the over-etching margin is relatively small. A hot H3 PO4 is used to remove the exposed nitride and expose the channel silicon in Fig. 2(f). After dipping for 30 s in HF solution to remove the nature oxide at the channel edge, a 150-nm a-Si is deposited and planarized by CMP. B+ implantation is used to dope the source and drain of top PMOS as shown in Fig. 2(g). The LTO is removed and the active area of the top PMOS is defined. The buried oxide is removed in BOE solution and the gate oxide is grown by thermal oxidation at the three exposed channels in Fig. 2(h). During the gate oxidation, the oxidation rate on the poly-silicon source and drain is much faster than that of the single crystal channel, which gives thicker isolation oxide and reduces the parasitic capacitance between the thick source/drain and the gate. However, the thick oxide between the gate and the S/D cannot be designed freely to tradeoff between the source/drain resistance and gate-to-drain miller capacitance. A 200-nm in situ doped poly-silicon is deposited, planarized, and defined to locate the poly-silicon inside the self-aligned region as shown in Fig. 2(i). A slow deposition rate in the reaction-limited regime has to be used to let the poly-silicon fill the channel below the active channel. The following steps are similar to the conventional CMOS process. The SEM picture of the fabricated stacked CMOS inverter is shown in Fig. 3. The total number of mask used in this process is seven, which is not more than the conventional 2-D CMOS process due to the large number of mask-free self-align process used. The measured voltage transfer characteristics shown in Fig. 4 indicated the successful fabrication of a double-layer stacked 3-D CMOS technology and demonstrated the feasibility to have a stacked 3-D CMOS technology with the NMOSFETs and PMOSFETs placed close to each other. For practical reasons, the two layers can be considered as one active device layer, but circuits constructed with this special active layer have a smaller floor print. When combined with 3-D packaging technology such as wafer bonding, the closely coupled layers form clusters relative to other layers. The process described in this part is only one of the examples used to implement the local-clustering 3-D IC concept. The concept is general in that it can be implemented through many technologies, such as the technology described in [8].
LIN et al.: LOCAL CLUSTERING 3-D STACKED CMOS TECHNOLOGY FOR INTERCONNECT LOADING REDUCTION
Fig. 4. Voltage transfer functions of a 3-D inverter fabricated using the 3-D local clustering technology.
1407
Fig. 5. Illustration of capacitance components at drain of the (a) 3-D stacked inverter and (b) 2-D planar inverter.
III. CIRCUIT DESIGN WITH THE 3-D STACKED CMOS TECHNOLOGY To evaluate the performance of the 3-D stacked IC technology in real circuits, we have to extract the device parameters to be used in simulation. To compare the performance of the 3-D IC with the conventional planar ICs, we assume that the planar technology uses an ultrashallow S/D extension plus elevated source drain that has similar thickness as the gate thickness of the double-gate MOSFET in the 3-D stacked IC technology. It is to eliminate the effect of parasitics due to tradeoffs in device design and to focus our study on the delay due to interconnect loadings. Nevertheless, it has also been shown that an elevated S/D is important in planar technology to tradeoff between short channel effects and source/drain series resistance [9]. The capacitance at the inverter output has two major contributions, namely, 1) the capacitance at drain of MOSFETs and 2) the wire capacitance. The components of capacitance at the drain of MOSFETs are shown in Fig. 5. A 3-D device grid is constructed using a 3-D device simulator calibrated to the process parameters to extract all the parasitic capacitance and resistance. Simulation results show that the capacitance at the drain side of the 3-D inverter is about 55% of the 2-D inverter after adding up all components together, and the relative value of each component is shown in Fig. 6. The Cjdbs_W shown in Fig. 6 is only the contribution along the gate direction. The other part of Cjdbs −Cjdbs_L is not shown in Fig. 6 as it is proportional to the drain length. The Cjdbs_L of the 3-D inverter is half of the 2-D inverter so that the Cjdbs of the 2-D inverter is about two to three times of the 3-D inverter. The major capacitance reduction of the 3-D inverter comes from the reduction of the drain-to-substrate capacitance of the PMOSFET (Cjdbs , Cjdbb ) and the interconnect capacitance at the output. The interconnect capacitance reduction is about 45%. The capacitance between Vss and Vdd through the source area is larger than the conventional 2-D MOSFET. This is not important as capacitance is not on the signal path and the supply voltage is relatively constant.
Fig. 6. Comparison of individual capacitance components between 2-D planar and 3-D stacked inverter at drain. The gate capacitance is used as a reference.
As a practical approach to apply the 3-D CMOS technology in circuit design, we have designed a set of standard cells using the stacked 3-D CMOS technology. This approach conforms to the current CAD infrastructure in the design of large systems. Compared with other 3-D circuit methods that allow arbitrary placement of individual devices at the top and bottom layers, the approach studied here may not be optimal with respect to interconnect minimization, but as will be demonstrated, the simple approach is capable of providing substantial improvement in speed–power tradeoff compared with conventional 2-D circuits. A number of standard cells designed with 2-D and 3-D structures are given in Fig. 7, indicating the amount of area saving. The comparison of cell size and intracell load capacitance of standard cells designed with a 2-D planar structure and a 3-D structure is shown in Fig. 8. To study the feasibility of reducing interconnect loading, a number of circuits have been designed using high-level Verilog language and doing auto placement and routing based on the
1408
IEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 53, NO. 6, JUNE 2006
Fig. 9. Result of a 128-bit carry-select adder constructed using automatic layout and placement using 2-D planar and 3-D stacked transistor standard cells. The area of the chip using the 3-D standard cells is about 58% of the 2-D case.
Fig. 7. Comparison of 2-D planar and 3-D stacked layout of inverter, twoinput NAND, two-input NOR, and four-input NAND gates.
Fig. 10. Comparison of critical path delay, circuit area, and total interconnect length of various 2-D and 3-D circuits of the 4-, 128-, 256-, 512-bit adders.
Fig. 8. Comparison of area and interconnect capacitance between 2-D planar technology and local clustering 3-D technology.
new 3-D standard cells and conventional 2-D standard cells. Examples of a 4-bit carry-look-ahead adder and a 128-, 256-, and 512-bit carry-select adder based on the 4-bit carry-lookahead adder are studied. The layout of a 128-bit adder is shown in Fig. 9 to indicate the area saving achieved by the 3-D stacked technology. We found that the area of 3-D circuits is always about 58% of the 2-D circuits, representing an area saving of over 40%. However, improvement of the critical path delay by using 3-D circuits depends on the size of the circuit, which is shown in Fig. 10. If the circuit is a small one, such as a 4-bit adder, the 3-D circuit shows little improvement in the critical path delay as the RC delay of each stage is dominated by the intrinsic gate capacitance and output capacitance. When the circuit becomes bigger with increased transistor count, a long interconnect with parasitic capacitance larger than that of the gate capacitance becomes dominant in the RC delay at the critical path. As
shown in Fig. 10, the advantage of the proposed 3-D circuit will become more and more prominent with increased number of transistor counts.
IV. THERMAL ANALYSIS ON THE STACKED 3-D CMOS TECHNOLOGY With the increasing number of transistors placed on the same chip, thermal reliability has become a very important issue in conventional planar IC design. This problem can be more serious when 3-D integration is applied in circuit design due to the higher power density dissipated per unit area. In addition, upper layer transistors are surrounded by dielectrics with large thermal resistance from the substrates. It makes the normal heat dissipation path through the substrate difficult. The temperature increment can be calculated by the thermal model in [10] and [11]. In this paper, we try to deduct the relationships between ∆T and dielectric thickness with simple math by simplifying the conditions. The temperature increment of the active layer can be calculated by ∆T = (Tactive − Tambient ) = P × Rthermal
(1)
LIN et al.: LOCAL CLUSTERING 3-D STACKED CMOS TECHNOLOGY FOR INTERCONNECT LOADING REDUCTION
Fig. 11. Temperature increment between the top active layer and the substrate as a function of dielectric thickness between the active layers in 3-D technologies (power density is assumed to be 0.5 W/mm2 per layer).
where P is the power flow from the active layer to the ambient and Rthermal is the thermal resistance between thermal flow paths. As the thermal resistance of silicon is 150 times lower than the oxide, the temperature variation in the same silicon layer can be neglected. Hence, the thermal resistance of the dielectric between active layers can be expressed as kdielectric =
tDie A × kθ_Die
(2)
where kθ_Die is the intrinsic thermal conductivity of the dielectric, tDie is the dielectric thickness, and A is the area connecting to the ambient. If the number of total active layers equals to n, the ith active layer temperature increment ∆T can be expressed by n tsub pj ∆Ti = kpackage + A × kθ_Si j=1 n t Die_j + Pk . (3) A × kθ_Die j=2 i
k=j
It is obvious that the thicker the dielectric thickness between the upper layer and the bottom substrate, the higher the temperature increment is. On the other hand, if the top layer is placed close to the bottom layer, forming a cluster of active layers, heat dissipation from the upper layer through the substrate becomes easier and the increase in the temperature can be mitigated. The result is demonstrated in Fig. 11, where the increase in temperature as a function of dielectric thickness between the substrate and the top layer of silicon is simulated using Davinci. So the development of closely coupled transistors with the top transistors close to the bottom layer of transistor is advantageous for the heat dissipation perspective. Unlike the approach of arbitrarily placing active devices in a multiple-layer stacked IC technology, the placing of two active layers close to each other will not significantly increase the interlayer capacitance as the interconnect between the two
1409
Fig. 12. Relative delay and heat dissipation as a function of supply voltage for circuits based on conventional 2-D and local clustering 3-D technologies 2 2 f (assuming P = 1/2αCVDD C and power density equal to 0.5 W/mm per layer at VDD = 1.8 V).
layers only consists of local wire routing within standard cells. 2 fC and a power density equal to Assuming P = 1/2αCVDD 2 0.5 W/mm per layer at VDD = 1.8 V, the increase in temperature relative to the ambient as a function of power supply voltage is given in Fig. 12. The increase in temperature is plotted with the axis on the right of the figure. It is observed that the temperature increase in a 3-D stacked IC is indeed more serious than that of conventional 2-D planar circuits at the same power supply. However, due to the smaller loading of the 3-D stacked IC, a higher circuit speed can be achieved by using the 3-D stacked CMOS technology at the same power supply voltage, which is also shown in Fig. 12. So if the same critical path delay is desired, the 3-D stacked CMOS approach can achieve the required performance at a lower supply voltage. Thus, when we compared the temperature under the constraint of the same critical path delay, 3-D stacked ICs can actually have a smaller temperature rise due to the lower power supply necessary. This can also be observed from Fig. 12. For example, a similar performance in delay can be achieved with a 3-D design at 1.2-V supply compared to 2-D at 1.5-V supply. Due to the lower supply voltage, the 3-D design will perform better in temperature as well. As a result, the 3-D stacked CMOS technology can actually provide a better tradeoff among speed, power, and device operation temperature in addition to the area saving. V. CONCLUSION In this paper, we have demonstrated the feasibility of applying the 3-D IC technology to reduce interconnect loading. A process to fabricate 3-D IC with closely coupled upper layer and lower layer is successfully developed. Based on the technology, the concept of local clustering to apply 3-D integration to standard cells is proposed. Many different cells have been constructed using layout tools with their parasitic components extracted using a 3-D device simulator. It is demonstrated that a significant reduction in interconnect loading can be achieved. The local clustering approach to form 3-D IC based on the standard cell approach is compatible with the current IC design
1410
IEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 53, NO. 6, JUNE 2006
infrastructure and CAD tools. Through detail thermal analysis, we have also shown that the thermal problem in 3-D IC is not as serious as perceived when using the local clustering approach between layers. The reduction of interconnect loading can significantly reduce the power dissipation. R EFERENCES [1] R. Zhang, K. Roy, C. Koh, and D. B. Janes, “Power trends and performance characterization of 3-dimensional integration for future technology generations,” in Proc. IEEE Int. Symp. Quality Electron. Des., 2001, pp. 217–222. [2] J. D. Meindl, “Beyond Moore’s law: The interconnect era,” IEEE Comput. Sci. Eng., vol. 5, no. 1, pp. 20–24, Jan./Feb. 2003. [3] K. Banerjee, S. J. Souri, P. Kapur, and K. C. Saraswat, “3-D ICs: A novel chip design for improving deep-submicrometer interconnect performance and systems-on-chip integration,” Proc. IEEE, vol. 89, no. 5, pp. 602–633, May 2001. [4] V. W. C. Chan, P. C. H. Chan, and M. Chan, “Large-grain polysilicon MOSFET for 3-D integrated circuits,” in Proc. IEEE SOI Conf., Oct. 2000, pp. 22–23. [5] L. Xue, C. C. Liu, H.-S. Kim, S. Kim, and S. Tiwari, “Three-dimensional integration: Technology, use, and issues for mixed-signal applications,” IEEE Trans. Electron Devices, vol. 50, no. 3, pp. 601–609, Mar. 2003. [6] H.-S. P. Wong, K. K. Chan, and Y. Tuar, “Self-aligned (top and bottom) double-gate MOSFET with a 25 nm thick silicon channel,” in IEDM Tech. Dig., 1997, pp. 427–430. [7] R. S. Shenoy and K. C. Saraswat, “Novel process for fully self-aligned planar ultrathin body double-gate FET,” in Proc. SOI Conf., 2004, pp. 190–191. [8] X. Wu, P. C. H. Chan, S. Zhang, C. Feng, and M. Chan, “A threedimensional stacked fin-CMOS technology for high-density ULSI circuits,” IEEE Trans. Electron Devices, vol. 52, no. 9, pp. 1998–2003, Sep. 2005. [9] J. J. Sun, R. F. Bartholomew, K. Bellur, A. Srivastava, C. M. Osburn, and N. A. Masnari, “The effect of the elevated source/drain doping profile on performance and reliability of deep submicron MOSFETs,” IEEE Trans. Electron Devices, vol. 44, no. 9, pp. 1491–1498, Sep. 1997. [10] T.-Y. Chiang, K. Banerjee, and K. C. Saraswat, “A new analytical thermal model for multilevel ULSI interconnects incorporating via effect,” in Proc. Interconnect Technol. Conf., Jun. 4–6, 2001, pp. 92–94. [11] T.-Y. Chiang, S. J. Souri, C. O. Chui, and K. C. Saraswat, “Thermal analysis of heterogeneous 3-D ICs with various integration scenarios,” in IEDM Tech. Dig., 2001, pp. 31.2.1–31.2.4.
Xinnan Lin received the B.S. degree in computer science from Peking University, Beijing, China, in 1997, and is currently working toward the Ph.D. degree at the Electronics Engineering Department, Hong Kong University of Science and Technology, Kowloon, Hong Kong. From 1997 to 1999, he was with the Institute of Microelectronics, Peking University.
Shengdong Zhang (M’04–A’04–M’04) received the B.S. and M.S. degrees from Southeast University, Nanjing, China, in 1984 and 1992, respectively, and the Ph.D. degree from Peking University, Beijing, China, in 2002, all in electrical and electronic engineering. From 1984 to 1985, he was an Associate Engineer at the Beijing Aircraft Technology Institute, Beijing, China, working on the design and fabrication of pressure sensors. In 1985, he joined the Nanjing Electronic Device Institute, where he first worked on the fabrication of photoelectronic devices based on silicon target and later was engaged in the development of AM-LCDs. From October 1996 to September 1998, he was a Research Assistant at the Department of Electrical and Electronic Engineering, Hong Kong University of Science and Technology (HKUST), Kowloon, Hong Kong, where he worked on the research and development of silicon-on-insulator (SOI) and poly-Si thin film transistor (TFT) devices. From 2000 to 2002, he was a Visiting Scholar at the Department of Electrical and Electronic Engineering, HKUST. In 2002, he joined Peking University and is currently an Associate Professor. His current research areas are 3-D MOS device, poly-Si TFT, SOI device, and sub-tenth-micrometer device technologies.
Xusheng Wu received the B.S. degree in microelectronics from the Department of Computer Science and Technology, Peking University, Beijing, China, in 2001. He is currently pursuing the Ph.D. degree in the Department of Electrical and Electronics Engineering, Hong Kong University of Science and Technology, Hong Kong. His research interests include fabrication and analysis on CMOS SOI devices, such as UTB devices, double-gate FinFET structure, 3-D integrated circuits and stacked SRAM cells, device design, and characterization, and device modeling.
Mansun Chan (S’92–M’95–SM’01) received the B.S. degree (highest honors) in electrical engineering and the B.S. degree (highest honors) in computer science from the University of California at San Diego, La Jolla, in 1990 and 1991, respectively, and the M.S. and Ph.D. degrees, from the University of California at Berkeley, in 1994 and 1995, respectively. During his undergraduate study, he was with Rockwell International Laboratory, working on heterojunction bipolar transistor (HBT) modeling, where he developed the self-heating SPICE model for HBT. His research at Berkeley covered a broad area in silicon devices ranging from process development to device design, characterization, and modeling. A major part of his work was on the development of recordbreaking silicon-on-insulator (SOI) technologies. He has also maintained a strong interest in device modeling and circuit simulation. He is one of the major contributors to the unified BSIM model for SPICE, which has been accepted by most U.S. companies and the Compact Model Council (CMC) as the first industrial standard MOSFET model. In January 1996, he joined the EEE Faculty at the Hong Kong University of Science and Technology. Between July 2001 and December 2002, he was a Visiting Professor at the University of California at Berkeley and the Codirector of the BSIM program. He is still currently consulting on the development of next-generation compact models. His research interests include nanodevice technologies, image sensors, SOI technologies, high performance IC, 3-D circuit technology, device modeling, and Nano BIOMEMS technology. Dr. Chan was the recipient of the UC Regents Fellowship, Golden Keys Scholarship for Academic Excellence, SRC Inventor Recognition Award, Rockwell Research Fellowship, R&D 100 Award (for the BSIM3v3 project), Teaching Excellence Appreciation Award (1999), and Distinguished Teaching Award (2004), among others.