Session_04_penmor.qxp:Session_

12/29/07

9:54 AM

Page 94

ISSCC 2008 / SESSION 4 / MICROPROCESSORS / 4.7 4.7

Circuit Design for Voltage Scaling and SER Immunity on a Quad-Core Itanium® Processor

Dan Krueger1, Erin Francom1, Jack Langsdorf2 1

Intel, Fort Collins, CO Intel, Hudson, MA

2

The 700mm2 65nm Itanium® processor [1] doubles the number of cores over its predecessor [2], from 2 to 4. It also adds a system interface that is roughly as large as two cores, including six QuickPath interconnects and four FBDIMM channels. This 3× increase in logic circuits per socket presents two major circuit challenges that are addressed in this paper. First, the chip is increasingly dependent on voltage-frequency scaling to fit in an acceptable power envelope. The resulting reduction in typical operating voltage makes it challenging to design the approximately 10 million non-static logic instances; this count excludes the L2 and L3 caches. Key circuit changes and analysis methods that improve low voltage operation are shown. Second, enterprise reliability require that per-socket SER does not increase over the previous processor generation. With triple the logic states per socket, we maintain acceptable SER by extensive use of SER-hardened circuits. For the chip to achieve its power and performance targets the cores must enable the power-frequency management system to vary the core voltage widely and system interface power must be contained by running at as low a voltage as possible. Process variation, a pulse-based design methodology, and the sheer size of the chip make this quite difficult. To achieve a wide operating region, an analysis of all non-static circuits was developed, requiring functionality across 7 process corners, 0.7 to 1.35V, and -10°C to 125°C with process variation. To have less than 1% yield loss due to the 10 million non-static circuits on the die, a total of 6.4σ of transistor length, width, and Vt variation is applied to each FET as a function of the effect it has on a given measurement. The processor instantiates many circuits that rely on pulse clocks for correct operation [3]. Pulsed writes are self-timed, gaining no margin as the clock frequency is reduced. Pulsed writes become difficult as the power supply voltage is reduced, approaching variation-affected Vt. Consequently, only the very peak of the pulse is effective for writing. Since the processor core is a port from 90nm to 65nm, substantial schedule savings are realized by using the existing pulse latch structure (Fig. 4.7.1) and placement. Overall latch size was not allowed to increase, which prevented qualifying the PMOS feedback transistor.

off the pulse. To further ensure a full-rail pulse, the output drive is 3× that of the pulse gater, relative to the allowed output loading. Despite the robustness of the new pulse generator, local pulse generator usage dropped from 40,000 to 300 per core because of the increased size of the circuit. Most usages were replaced by AND gates that are tuned to pass pulses from a gater with minimal distortion. These AND gates provide a local enable with timing characteristics similar to the local pulse generator. Since a majority of the core is ported from 90nm, we could not make a drastic reduction in per-core SER. This forced the design team to be very aggressive in addressing SER in the new parts of the design. More than 99% of the latches in the system interface are SER-hardened, and 33% of the core latches have also been converted. Furthermore, the system interface protects 34 unique small register files, including all CAMs, with SER-hardened storage. DICE-type structures [4] are chosen for both types of storage. Figures 4.7.4 and 4.7.5 compare these hardened structures to unprotected latches and register file cells. The SER-hardened latch of Fig. 4.7.4 uses a write structure that achieves timing characteristics comparable to the unprotected pulse latch of Fig. 4.7.1. The write mechanism presented here pulls the feedback out of the way, resulting in excellent clock–tooutput delays, which is 25 to 30% faster than writing two same sense nodes. Figure 4.7.5 shows the register file cell. Added wire length and the double write construction make the cell harder to write, but write timing is generally not critical. The load on the read bitlines and address lines in the decoder increase by less than 10%, making timing only slightly worse. The SER benefits of a DICE structure depend on layout that physically separates the storage node diffusions. We use a layout that keeps the latch nodes that are sensitive to multi-bit strikes a minimum of 1.1µm apart. This results in a 100× reduction in SER over the unprotected latch in Fig. 4.7.1. From an error-rate perspective, the 865,000 SER-hardened latches are equivalent to 8,650 unprotected latches. The more compact register file cell achieves an 80× improvement, on 600Kb of storage. This spacingdependent strategy is expected to work for 1 to 2 more process generations. The costs of this SER protection are 34 to 44% area increases and 25% higher power consumption. Including the entire overhead, the area penalty for a 32b×32entry, 1-read, 1-write register file is 25%. In comparison, ECC costs the same 25% area for small amounts of storage, and presents difficult pipelining and timing problems that are avoided with this SER-hardened design.

Entry-latches also use a pulse to capture data at a clock edge and produce monotonic data for dynamic logic. Implicitly pulsed entry-latches are eliminated in favor of an explicitly pulsed topology, shown Fig. 4.7.2. Here, the NMOS feedback transistor is qualified to improve low-voltage precharge.

This chip overcomes a significant circuit design challenge due to process variability, operating voltage, power envelope, high circuit counts, pulsed writes, and SER limitations. Achieving the design goals required extensive simulation and targeted design changes as well as broad use of SER tolerant structures.

There are two circuits used to create pulse clocks. Clock gaters typically drive a large number of latches along a short wire. A transfer gate is added to the internal delay chain that determines the clock pulse width, shown in Fig. 4.7.3. The slope of transfer gate output is matched to the latches, enabling the pulse width to track latch write characteristics across PVT. All gaters have programmable pulse widths. In new designs, a 20% wider pulse is software programmable, Fig. 4.7.3, and ported designs have a metal option for an 8% wider pulse.

Acknowledgements: Shawn Davidson, Laura Dietz, Kevin Duda, Jon Lachman, Casey Little, Charles Morganti, John Wanek

Local pulse generators drive 1 or 2 latches, when a larger gater is not practical. We converted from the simple construct shown in Fig. 4.7.1, to a more complex circuit because the transistors are small and subject to more random variation than the larger devices in the gaters. The new structure requires the output pulse to reach the high VIH of a feedback inverter before turning

94

References: [1] B. Stackhouse et al., “A 65nm 2-Billion Transistor Quad-Core Intel® Itanium® Processor”, ISSCC Dig. Tech. Papers, pp. 92-93, Feb. 2008. [2] S. Naffziger, et al., “The Implementation of a 2-core, Multi-threaded Itanium® Family Microprocessor”, IEEE J. of Solid State Circuits, vol. 41, no. 1, pp. 197-209, 2006. [3] S. Naffziger et al, “The Implementation of the Itanium 2 Microprocessor”, IEEE J. of Solid State Circuits, Vol. 37, No. 11 pp 14481460, 2002. [4] P. Hazucha, et al., “Measurements and Analysis of SER-Tolerant Latch in a 90-nm Dual-Vt Process”, IEEE J. of Solid State Circuits, vol. 39, no. 9, pp. 1536-1543, 2004.

• 2008 IEEE International Solid-State Circuits Conference

978-1-4244-2010-0/08/$25.00 ©2008 IEEE

Session_04_penmor.qxp:Session_

12/29/07

9:54 AM

Page 95

ISSCC 2008 / February 4, 2008 / 4:45 PM latch

scan 90nm

4

90nm local pulse generator CKS CK PCK 65nm

65nm pulse generator

Can be NAND/NOR

High VIH

Figure 4.7.2: Entry-latch changes.

Figure 4.7.2: Entry-latch changes.

Figure 4.7.1: Pulse latch with local pulse generator.

Pulse Gater

90nm Delay Line Latch

Primary feedback

parameter

% of unprotected

area

134%

pck load

136%

flowthru (in to q)

98%

pckÆ Æout (pck rise to q)

96%

setup (in before pck fall)

106%

SER FIT

100x better

Standby power

127%

Active power

125%

1.1um

Scan

65nm Programmable Delay Line 1.1um

Slope matched to latch storage node

Figure 4.7.3: Programmable pulse gater with transfer gate.

parameter

Figure 4.7.4: SER-hardened pulse latch.

Figure 4.7.4: SER-hardened pulse latch.

% of unprotected

Word line dimension

100

Bit line dimension

144

Write time

135

Read bit cap

110

Read word cap

103

Write bit cap

167

Write word cap

164

SER FIT

80x better

Standby power

124

Instances per Tukwila Register file cells

Figure 4.7.5: SER-hardened register file cell.

4,400,000

Pulse latches

1,500,000

Dynamic circuits

1,000,000

Entry-latches

570,000

Voltage converters

340,000

e 4.7.6: Die micrograph and Gaters tics. Figure 4.7.6: Die photo and statistics.

110,000

DIGEST OF TECHNICAL PAPERS •

95

isscc 2008 / session 4 / microprocessors / 4.7

The 700mm2 65nm Itanium® processor [1] doubles the number of cores over its predecessor [2], from 2 to 4. It also adds a system interface that is roughly as large as two cores, including six. QuickPath interconnects and four FBDIMM channels. This 3× increase in logic circuits per socket presents two major circuit.

1MB Sizes 4 Downloads 148 Views

Recommend Documents

isscc 2005 / session 23 / wireless receivers for ...
LNB for single users in a silicon bipolar technology has been pre- sented [1]. ... quency applications, it is die-area consuming for an IF as low as. 500MHz.

isscc 2012 / session 23 / advances in heterogeneous ... - CiteSeerX
that holds custom fabricated coupled power inductors for the buck converter while breaking out signals and the 1.8V input power supply to wirebond pads on.

isscc 2012 / session 23 / advances in ... - Semantic Scholar
microprocessors and systems-on-chip. Dynamic voltage and frequency scaling. (DVFS) is a ... IBM's 45nm SOI process, contains buck converter circuitry, decoupling capaci- tance and a realistic digital load. This IC is flip-chip ... Figure 23.1.2 shows

Session 4- WE Healthy Living.pdf
Page 1 of 5. HEALTHY LIVING. PROGRAMS. West End. Session 4: June 5-July 16, 2016. Summer Hours: May 28-Sept 5. BUILDING HOURS. Sun 12-6pm.

Session 4-LN Harsh
log 10 Dry Weight (kg) =2.1905 [log 10 stem diameter (cm)] –0.9811 after verifying it by selective destructive sampling. Biomass data were also subjected to ...

English session Mandarin session
If possible, please turn off phones and laptops. Toastmaster. Runs the meeting. Responsible for the agenda and confirming all meeting roles in advance.

Meeting Notes 4-1-2008
the evening, Dianne Costello. SPEAKER 1 : Larry. Shivertaker. CC#9. Title: Gold or Fool's. Gold? Great speech Larry! WEEKLY WINNERS! Best Speaker: Larry.

lost season 4 2008.pdf
tle 39 s lost blog all about lost podcast. Lost secrets the bermuda. triangle 2008 coverscovers hut. Lost season 4 wikipedia, the free. encyclopedia. Lost season 4 ...

Unit 4 Math 3 Honors Worksheet 47.pdf
Connect more apps... Try one of the apps below to open or edit this item. Unit 4 Math 3 Honors Worksheet 47.pdf. Unit 4 Math 3 Honors Worksheet 47.pdf. Open.

UPJ_#_2_(4)_2017_Abstract _4_ Klychkovskiy _pp-47-57.pdf
Information about author. Klychkovskiy S. O., Postgraduate student, Social Psychology Department, Faculty of Psychology, Taras Shevchenko. National University of Kyiv. E-mail: [email protected]. Page 2 of 2. Main menu. Displaying UPJ_#_2_(4)_201

Session 4-Felker - Weather Based Irrigation Controllers from Accurate ...
California field trials that measured biomass production, pod production, and pod sugar and protein .... However, in the last four years of the experiment, the.

WORK SESSION AGENDA 4-20-16.pdf
K-12 Counselor. Jill Moomaw. Dean of Students/AD. Matt Tait. Page 1 of 1. WORK SESSION AGENDA 4-20-16.pdf. WORK SESSION AGENDA 4-20-16.pdf.

Session 4-Felker - Weather Based Irrigation Controllers from Accurate ...
California field trials that measured biomass production, pod production, and pod sugar and protein .... However, in the last four years of the experiment, the.

Trading Session- 1 Trading Session- 2 - NSE
Jun 2, 2018 - In continuation to our circular (Download No. ... Members are requested to refer circular no NSE/CD/37850 dated .... Primary (BKC) / DR site.

Trading Session- 1 Trading Session- 2 - NSE
Apr 27, 2018 - Mock trading on Saturday, May 05, 2018– No new version release ... conducting a mock trading session in the Futures & Options Segment on ...

8085 Microprocessors - PDFKUL.COM
which consists of various instructions such as MOV, ADD, SUB, JMP, etc. These instructions are written in the form of a program which is used to perform various operations such as branching, addition, subtraction, bitwise logical and bit shift operat

06CS45 Microprocessors
1. a) Explain the internal architecture of 8086 microprocessor with a neat diagram. ... b) Explain with an example, the sequence of operations performed when.

8085 Microprocessors - IJRIT
The Intel 8085 required a minimum of an external ROM and RAM and an 8 bit ... in Information Technology, Volume 2, Issue 10, October 2014, Pg. 258-263 .... Win85 - Open source (under the MIT license) simulator/debugger for Windows. [5] ...

Trading Session- 1 Trading Session- 2 - NSE
4 days ago - In view of the same, Exchange will be conducting a mock trading (contingency) session in the. Currency Derivatives Segment on Saturday, July ...

MORNING SESSION
Apr 9, 2014 - Ministry of Information,. Youth, Culture and ... Performing Arts Department. 11.30-12.00 am ... SCIENCE AND TECHNOLOGY-. COSTECH.

Trading Session- 1 Trading Session- 2 - NSE
Jul 6, 2018 - Live Re-login start time. 17:00 hrs. Live Re-login close time. 17:30 hrs. Members shall be able to login to live trading system with the following ...

ADVANCED MICROPROCESSORS AND MICROCONTROLLERS.pdf
Page 1 of 1. ADVANCED MICROPROCESSORS AND MICROCONTROLLERS.pdf. ADVANCED MICROPROCESSORS AND MICROCONTROLLERS.pdf. Open.

#47 Queensborough4ai.pdf
Legend. Queensborough 20km. Parking/Start Options. Easy Ride - Green. Tugboat. Annies. Pub. Runway. Cafe. Route starts at. 22nd Street. Skytrain Station.