EE241 Final Report Building High Speed Sense Amplifier For SRAMs With Offset Compensation Yida Duan (
[email protected]) Abstract - Building high speed sense amplifiers for SRAM have become increasingly challenging in deep submicron technologies due to issues of severe process variation. In recent publications, offset compensation technique has been utilized to deal with this problem [2]. In this paper, a new sense amplifier with offset compensation, Offset-compensated current amplifier (OCCSA), is proposed. For comparison, Conventional voltage sense amplifier (VSA), Clampedbitline current sense amplifier (CSA) [1], as well as a newly published offset-compensated sense amplifier, Non-strobed regenerative voltage sense amplifier (NSR-VSA) [2], are studied in addition to OCCSA. The simulation results show OCCSA is fastest of the four at same yield level with small cost in extra power and area. Index Terms - Conventional voltage sense amplifier (VSA), Clamped-bitline current mode sense amplifier (CSA), Nonstrobed Regenerative Voltage sense amplifier (NSR-VSA), Offset Compensated Current Mode Sense amplifier (OCCSA)
Introduction In recent years, random offset have become limiting factor on speed for sense amplifiers in SRAMs. To ensure robust operation for all sense amplifiers across the die, they have to be enabled after the input signal has grown larger than 99% individual offsets. This is accomplished by inserting a delay between input multiplex select signal and reset release (or sense enable) of the sense amplifier. This delay is referred as “time margin” in this paper. In the case of minimum cell and large bitline capacitance, time margin can be much larger than sense time. Therefore, offset compensation has become an ideal solution to improve sense amp performance. To correctly characterize speed of sense amp in this paper, response time is defined to be the sum of sense time and time margin: Tresponse = TM (@ 99% Yield) + Tsense Sense amplifier offset compensation techniques can be classified into static and dynamic. Although it may require extra power overhead, dynamic offset compensation is more suitable for SRAM application because it requires minimal circuit complexity and area overhead. Dynamic compensation does not need extra logic to offset tuning or expensive factory laser
trimming. In addition, dynamic offset compensation enables cycle-by-cycle offset compensation, which tolerant to sudden environment changes such as temperature. Therefore, only dynamic offset compensation technique is investigated in this paper.
Experiment Setup Simulation results are generated using 45nm LK PTM model with a 1V supply voltage. The input of each sense amp is driven by a 40uA current source, which is equivalent to a minimum sized SRAM cell with cell ratio of 1.5. The bitline capacitance is assumed to be 100fF. A relative large sized pmos (1um/45nm) is used as bitline multiplexor to minimize its effect on response time. The output of sense amplifier is assumed to have a small load capacitance 0.2fF. All transistors in sense amps have minimum length (45nm). Small width (pmos/nmos=120nm/60nm) is used for most transistors except the most critical ones. Ideal capacitors are used for offset nulling capacitors in both NSR-VSA and OCCSA. They are 4fF each. This value is chosen to be 10X larger than total gate capacitance of reset switches to minimize offset error caused by mismatch in charge injection. To accurately simulate offset voltage, the variance of Vt for both 60nm nmos and 120nm pmos is assumed to be 50mV. The threshold variance of 1u nmos is set to 12mV (inversely proportional to device area). In HSPICE simulation, threshold variations of all transistors are included except bitline multiplexors and output buffers, which are not critical to sense amp offset. Variations on channel length and width are not considered. Ideal clocks with sharp transition edge (1p) are used in simulation. Any timing uncertainly is ignored for simplicity.
Proposed Solution
The proposed sense amp is offset-compensated current mode sense amplifier (OCCSA). It is an improved version of clamped-bitline current mode sense amplifier. As shown in above schematics, its basic structure is the same as clamped-bitline current mode sense amplifier. Two 4fF offset nulling capacitors are added in series of regeneration loop. Detailed analysis is provided in following section.
Analysis (1) Conventional voltage sense amplifier (VSA)
Transient Simulation
The operation of VSA is as following: In reset phase, Ysel is low, SAEN is high, bitlines are disconnected from the sense amp; regeneration is disabled through M5. The output nodes are precharged to Vdd through M6 & M7. After reset phase, Ysel is driven low while SAEN is still kept high, and input current starts to discharge BLb. Sense amp inputs are connected to the bitlines through pmos multiplexors, which gives time margin for input differential voltage to grow larger that the offset. Then, SAEN is pulled high, and sense phase begins. M5 is turned on, and output nodes are quickly discharged through M1 & M2 to Vdd − | Vtp | , at which point M3 & M4 turn on to start regeneration . VSA does not consume power during reset phase because it is cut off from supply. However, it has large random offsets and sense time. Threshold variations of all the regeneration devices M1—M4 contribute to offset. Large bitline capacitors present as amplifier load during regeneration. It can be shown that the regeneration time constant is roughly equal to CBL / gm1, 2 .
Schematics
(2) Clamped-bitline current sense amplifier (CSA) [1] Schematics
Transient Simulation
current compare to M1—M4, because they are in linear region, so their Ids weakly depend on threshold voltage variations. But M1—M4 give rise to an input current offset. To ensure robustness, the input source has to be able to provide current much larger that this offset current variance. (3) Offset-compensated (OCCSA)
current
sense
amplifier
Schematic
The operation of CSA is quite different from VSA. In reset phase, Ysel is high, and bitlines are disconnected from sense amp inputs. The sense amp is reset to its meta-stable point by M7—M9. Between reset phase and sense phase, reset phase, Ysel is driven low while Reset is still kept high. M8 & M9 keeps the sense amp at its meta-stable point, while the input current starts to discharge both bitlines. A current starts to flow through M7 and slowly ramps up. It can be shown
Transient Simulation
IM 7 = Iin [1 − exp( −t / Rm 72C BL )] , where 2
RM7 is small signal resistance of M7 in linear region, and Iin is the input current at BLb. After IM7 grows larger than offset current, Reset is pulled low and sense phase begins. Conducting path through M7 is shut down, but the currents through M3 M4 and M5 M6 can not change instantaneously, creating current imbalance between the 2 branches (M1,3 and M2,4). This imbalance of current triggers the regeneration loop formed by M1—M4. The sense time of CSA is much smaller comparing to VSA because large bitline capacitance are connected to the source nodes of M3,4 and therefore does not load regeneration device. In addition, bitline voltages are clamped to value close to Vdd at the end of sense cycle, which saves energy by avoiding to completely discharge bitlines. However, the sense amp is not cut off from supply in reset phase, so CSA consumes stand-by power. CSA also presents a large offset. M5 & M6 have negligible effect on offset
The operation of OCCSA is exactly the same as CSA. It has all the benefits of CSA. But offset
contributed by M1—M4 are sampled on the two 4fF offset nulling capacitors in reset phase. In sense phase, these 2 capacitors are in series of the regeneration loop, which effectively removes offset. As a result, both input current driving capability and the time margin can be significantly reduced from CSA with reasonable increase in total area. (4) Non-strobed regenerative voltage sense amplifier (NSR-VSA) [3] Schematics
M5 reaches Vt, at which point M5 turns on and closes the regeneration loop. Then regeneration quickly drives down the output. NSR-VSA is fast during regeneration because bitline capacitor is decoupled from the output of the sense amp by a 4fF offset nulling capacitor, therefore does not present as load. In addition, it attempts to further reduce the response time by eliminating time margin. However, elimination of time margin results in an increase in sense time, because regeneration starts after the input voltage falls below a build-in threshold (~100mV), which is fixed by threshold of M5 and charge injection of M5 & M7 . Furthermore, NSR-VSA requires larger area for offset nulling capacitors, and consumes power during reset phase.
Simulation & Comparison Results (1) Sense Time Figure 1. Input differential voltage vs. Sense Time
Transient Simulation
NSR-VSA does not need any time margin. In reset phase, the bitline multiplexor is off. M6 & M7 are on, shorting the inputs of the two regeneration inverters to their outputs. Any offset caused by mismatch between these two inverters are sampled on the 4fF offset nulling capacitors. M8 is off, opens regeneration loop. At the end of reset phase, the bitline multiplexor turns on shortly before (~10ps) the reset goes low (this delay is to eliminate the effect of clock feed-through by the multiplexor). At the beginning of reset phase, voltage at X is slightly larger that Y. This is because of different type of charge injection by pmos and nmos switches to avoid false regeneration, as mentioned in [3]. Initially after reset release, voltage at X starts to fall at the same rate of bitline discharge. This process continues until the Vgs of
In Figure 1, sense time is plotted as a function of input voltage level with zero offset variation. The sense time is measured from reset release (or sense amp enable) to the time when correct output nodes changes to Vdd/2. CSA and OCCSA clearly have smaller sense timer than VSA. This is because bitline capacitance does not present as load during sense phase for these amplifiers. CSA is slightly faster than OCCSA because of larger OCCSA load caused by offset nulling capacitors and extra switches. The plot shows NSR-VSA has very large sense time. This does not suggest NSR-VSA is slow, because its time margin is zero, so its sense time is equal to its response time.
(2) Input referred offset & Yield Figure 2. Monte Carlo Simulation of Offset Voltage/Current
42% for large time margin. It is explained by its large offset current (27uA) which is comparable to maximum input current (40uA). Therefore, CSA is not robust to operate such small input current level. From Figure 3, It is clear that OCCSA is faster than both VSA and CSA, because response time of VSA is at least 2ns, and CSA cannot achieve 99% yield level at the input current (40uA) in setup. What remains unclear is whether OCCSA is faster than NSR-VSA. (3) Response Time Figure 4. Monte Carlo Simulation of Response Time
Table1. Offset Voltage Variance
VOS 2
VSA 47mV
NSR-VSA 8.1mV
CSA 15.8mV
OCCSA 1.5mV
Figure 2 shows 100-pt Monte Carlo simulation on offset voltage or currents. To convert current offset to voltage offset, current offset are multiplied by equivalent small signal resistance of M7, which is 585Ohm obtained from simulation. Table 1 shows VSA has largest offset, OCCSA achieves an offset variance smaller than 2mV, 25X smaller than VSA and ~X10 smaller that CSA because of offset compensation, and the offset NSR-VSA is 5X lower than VSA. Although CSA seems to have a small offset voltagewise, it is least robust of the four because its current offset is comparable to input driving current. Figure 3. Time Margin vs. Percentage yield
Figure 4 shows 100-pt Monte Carlo simulation of response time for OCCSA and NSR-VSA. For OCCSA, 50p time margin is used. To achieve a 99% Yield level, response time is assumed to be the average response time plus three times of the variance: Tresponse = Avg(Tresponse) + 3σ(Tresponse) The results are summarized in Table 2. OCCSA is twice as fast as NSR-VSA at the same Yield level. Note that in Figure 4, OCCSA has less response time variance than NSR-VSA. This can be explained by larger-than-needed time margin inserted in OCCSA. Table 2. Performance Summary
Sense Amp
In Figure 3, percentage yield is plotted as a function of time margin, for fixed input current (40uA) and bitline capacitance (100fF). NSR-VSA is not plotted here since it does not require time margin. Each data point is generated by 100-pt Monte Carlo simulation. To achieve 99% yield, OCCSA only requires a time margin of 50ps while VSA needs at lease ~2ns. It is interesting to note that yield of CSA saturates at roughly
VSA CSA NSR-VSA OCCSA
Total Reponse Time (ps) > 2000 Not Robust 133.3 62
Reset Power (uW) 0.13 36.1 48.9 36.1
Area (um2) 0.052 0.107 0.451 0.473
(4) Reset Power and Area Area of each sense amp estimated using the transistor dimensions (Σ W•L). Bitline multiplexors are not included in area calculation. Offset nulling capacitor density are assumed to be equal MOS capacitors density. In reality, offset nulling capacitors can be
implemented as vertical or planer metal capacitors on top of sense amp to reduce area. The results are summarized in Table 2. It should be noted the area in Table 2 is an underestimation for VSA and CSA because design rules are not taken into consideration. The actual area for VSA and CSA is likely to be considerably larger. However, actual sizes of NSR-VSA and OCCSA are not expected to be much different because they are limited by offset null capacitor. Among the four sense amps, VSA virtually consumes zero reset power. This is because it is cut-off from supply, so only power it consumes in reset phase is due leakage. CSA and OCCSA consume equal amount of reset power because they are in the same configuration during reset phase. NSR-VSA consume slightly more power in reset than CSA and OCCSA due to cascade effect of bitline clamp pmos’s in CSA and OCCSA. The power overhead for OCCSA and NSR-VSA is not a major problem in most cases, because only one sense amp is used for several columns of SRAM cells, so reset power of sense amps is only a small fraction of total power in high speed SRAM’s.
Conclusion & Future Work In this paper, a new sense amplifier for SRAM, offset-compensated current sense amplifier (OCCSA), is proposed. Among the four sense amplifiers studied, HSPICE simulation show OCCSA achieves highest speed at the same level of robustness. At the fixed input current (40uA) and bitline capacitance (100fF) in setup, OCCSA is X30 faster than VSA, X2 times faster than NSRVSA, with comparable reset power and area to NSRVSA. Any timing uncertainty is ignored. Possible future work includes understanding cost and limitation of OCCSA under time margin uncertainty.
References [1] Travis N. Blalock, Richard C. Jaeger, “A high-speed clamped bit-line current-mode sense amplifier”, IEEE Journal of Solid-State Circuits, Vol.26, No. 4, April 1991 [2] Naveen Verma, Anantha P. Chandrakasan, “A High-Density 45nm SRAM Using Small-Signal Non-Strobed Regenerative Sensing”, ISSCC 2008 [3] Bernhard Wicht, Thomas Nirschl, Doris SchmittLandsiedel, “Yield and Speed Optimization of a Latch-Type Voltage Sense Amplifier”, IEEE Journal of Solid-State Circuits, Vol. 39, NO. 7, July 2004 [4] M. Matsui, et al., “A 200MHz 13mm2 2-D DCT Macrocell Using Sense-Amplifying Pipeline Flip-Flop Scheme”, IEEE Journal of Solid-State Circuits, vol. 29, No. 12, Dec 1994. [5] A. Chrisanthopoulos, Y. Moisiadis, Y. Tsiatouhas and A. Arapoyanni, “Comparative study of different current mode sense amplifiers in submicron CMOS technology”, IEEE Proc. Circuits Devices Syst., Vol. 149, No. 3, June 2002