The TAU 2015 Contest Incremental Timing and CPPR Analysis
Jin Hu
Greg Schaeffer
Vibhor Garg
IBM Corp.
IBM Corp.
Cadence
[Speaker]
Sponsors: TAU 2015 Workshop – March 12th‐13th, 2015
1
Motivation of Incremental Analysis Performance Faster turnaround time for timing analysis in presence of design changes
Incremental Timing (Arrival Time Update Example)
i1
inp1
i2
inp2 δ inp3
o i3
n1 a b
o
n2 a b
out o
2
Motivation of Incremental Analysis Performance Faster turnaround time for timing analysis in presence of design changes
Incremental Timing (Arrival Time Update Example)
i1
inp1
i2
inp2 δ inp3
o i3
n1 a b
critical path o
pins affected n2 a b
out o
3
Motivation of Incremental Analysis Performance Faster turnaround time for timing analysis in presence of design changes
Incremental Timing (Arrival Time Update Example)
i1
inp1
i2
inp2 δ inp3
o i3
n1 a b
critical path o
pins affected n2 a b
out o
4
Motivation of Incremental Analysis Performance Faster turnaround time for timing analysis in presence of design changes
i1
inp1
i2
inp2 δ inp3
o i3
n1 a b
critical path o
pins affected n2 a b
out o
Runtime Improvement (X)
Incremental Timing (Arrival Time Update Example) 60 50
50.43
46.50
40 30 20
15.87 8.88
10 0 5
70
500
6000
# Random Operations
5
Motivation of Incremental Analysis Performance Faster turnaround time for timing analysis in presence of design changes
Incremental CPPR (Credit Computation Update Example) f1
i7
f1
inp1 n1
f2
f3
f3 out
i4
f2
inp2 i3 i1
i2
i5
i6
clk 6
Motivation of Incremental Analysis Performance Faster turnaround time for timing analysis in presence of design changes
Incremental CPPR (Credit Computation Update Example) f1
i7
f1
inp1 n1
f2
f3
f3 out
i4
f2
inp2 i3
f1
f3
f2
f3
δ1 i1 clk
i2
i5
i6
δ1 7
Motivation of Incremental Analysis Performance Faster turnaround time for timing analysis in presence of design changes
Incremental CPPR (Credit Computation Update Example) f1
i7
f1
inp1 n1
f2
f3
f3 out
i4
f2
inp2 i3
f1
f3
f2
f3
f2
f3
δ1 i1 clk
i2
i5 δ2
i6 δ2
8
Motivation of Incremental Analysis Performance Faster turnaround time for timing analysis in presence of design changes
Broadened Research Scope Enables timing‐driven applications, such as placement and routing
Theory and Implementation Requires both algorithmic intelligence and agile coding techniques Incremental CPPR (Credit Computation Update Example) f1
i7
f1
inp1 n1
f2
f3
f3 out
i4
f2
inp2 i3
f1
f3
f2
f3
f2
f3
δ1 i1 clk
i2
i5
i6 δ3
δ2
9
TAU 2015 Contest: Bigger and Better Develop a block‐based and path‐based timer that supports † Incremental Timing and Incremental CPPR Analysis
Delay and Output Slew Calculation Separate Rise/Fall Transitions Block / Gate‐level Capabilities Path‐level Capabilities Statistical / Multi‐corner Capabilities Incremental Capabilities Industry‐standard Input Formats †CPPR: process of removing inherent but artificial pessimism from timing tests and paths
10
TAU 2015 Contest Overview Provided to Contestants Detailed Documentation
Evaluation
Benchmarks Design Early and Late Design Connectivity Libraries Parasitics
Timing and CPPR tutorials, delay modeling, file formats, evaluation rules, etc.
Block‐based and Path‐based Post‐CPPR Timing Analysis
Verilog (.v) Liberty (.lib) SPEF (.spef) wrapper file (.tau2015) Assertions Operations
(.output)
Open Source Code 1. PATMOS 2011: NTU‐Timer 2. TAU 2013: IITiMer 3. TAU 2014: UI‐Timer 4. ISPD 2013: .spef/.lib parsers Previous contest winners, utilities
(.timing)
(.ops)
Based on TAU 2013, ISPD 2013, ICCAD 2014, Cadence Benchmarks
Time frame: ~4 months Contest Scope: only hold, setup, RAT tests; no latches (flush segments); single‐source clock tree
Golden Runtime Result* Accuracy
Memory Usage
Performance
*using industry timer
11
industry‐ standard formats
Input Files – Design Infrastructure Verilog (.v): design connectivity module simple
IN1
(IN1, IN2, IN3, OUT);
nand2 IN1 IN2
NAND2_X1 nand2 (.a(IN1), .b(IN2), .o(n1)); NOR2_X1 nor2 (.a(n1), .b(IN3), .o(OUT);
IN2
endmodule
IN3
a b
nor2
o n1
a b
OUT o
OUT
IN3
12
industry‐ standard formats
Input Files – Design Infrastructure Verilog (.v): design connectivity module simple
IN1
(IN1, IN2, IN3, OUT);
nand2 IN1 IN2
NAND2_X1 nand2 (.a(IN1), .b(IN2), .o(n1)); NOR2_X1 nor2 (.a(n1), .b(IN3), .o(OUT);
IN2
endmodule
IN3
a b
nor2
o n1
a b
OUT o
OUT
IN3
Liberty (.lib): gate library cell (“NAND2_X1”) { // Fall‐to‐Rise from a o pin (“o”) { direction : output; capacitance : 0.0; timing() { cell_fall (“scalar”) { // delay values (“26.064”); } fall_transition(“scalar”) { // output slew values (“30.216”); } timing_sense : negative_unate; related_pin : “a”; } } }
13
industry‐ standard formats
Input Files – Design Infrastructure Verilog (.v): design connectivity module simple
IN1
(IN1, IN2, IN3, OUT); IN2
endmodule
IN3
cell (“NAND2_X1”) { // Fall‐to‐Rise from a o pin (“o”) { direction : output; capacitance : 0.0; timing() { cell_fall (“scalar”) { // delay values (“26.064”); } fall_transition(“scalar”) { // output slew values (“30.216”); } timing_sense : negative_unate; related_pin : “a”; } } }
IN1 IN2
NAND2_X1 nand2 (.a(IN1), .b(IN2), .o(n1)); NOR2_X1 nor2 (.a(n1), .b(IN3), .o(OUT);
Liberty (.lib): gate library
nand2 a b
nor2
o n1
a b
OUT o
OUT
IN3
IN2 C1
R1 IN2:1 R2 C2
nand2:b C3
SPEF (.spef): design parasitics *D_NET IN2 1.6 *CONN *P IN2 I *I nand2:b I *CAP C1 IN2 0.2 C2 IN2:1 0.5 C3 nand2:b 0.9 *RES R1 IN2 IN2:1 1.4 R2 IN2:1 nand2:b 1.6 *END
14
Input Files – Assertions (.timing) Provides initial conditions for timing
IN1
Values specific to [Early/Late] x [Rise/Fall] Arrival times for each Primary Input
nand2 IN1 IN2
a b
IN2
nor2
o n1
a b
OUT o
OUT
IN3
Input slews for each Primary Input
IN3
Required arrival times for each Primary Output Asserted pin capacitances for each Primary Output Clock period for the clock source (if design is sequential)
Assertions (.timing) // ER EF LR LF
at at at slew slew slew rat
IN1 IN2 IN3 IN1 IN2 IN3 out
0 0 0 10 30 30 10
load
out
4
0 0 0 15 30 30 10
5 1 10 20 40 40 20
5 1 12 25 40 40 20 15
Input Files – Operations (.ops) Design‐level Operations insert_gate
GATE‐LEVEL
: creates new gate of type
remove_gate
:
repower_gate :
removes existing gate changes power level of gate to type
insert_net
NET‐LEVEL : creates new net
remove_net
:
removes existing net
read_spef
:
changes or adds parasitics of net(s) in
PIN‐LEVEL disconnect_pin : decouples pin from any net connect_pin : links pin to net 16
Input Files – Operations (.ops) Design‐level Operations Example
u1
u2
a o
a o
IN
OUT IN
add inverter into design
OUT
n1
u1
u2
u3
IN
OUT a o
a o IN
n1
a o n2
OUT 17
Input Files – Operations (.ops) Design‐level Operations Example
u1
u2
a o
a o
IN
OUT IN
add inverter into design
u1 disconnect_pin u2:o
OUT
n1
u2
IN
OUT a o
a o IN
OUT
n1
u1
u2
u3
IN
OUT a o
a o IN
n1
a o n2
OUT 18
Input Files – Operations (.ops) Design‐level Operations Example
u1
u2
a o
a o
IN
OUT IN
add inverter into design
u1 disconnect_pin u2:o
OUT
n1
u2
IN
OUT a o
a o IN
OUT
n1
u1
u2
u3
a o
a o
a o
IN
OUT
insert_gate u3 INV_X1 IN
OUT
n1
u1
u2
u3
IN
OUT a o
a o IN
n1
a o n2
OUT 19
Input Files – Operations (.ops) Design‐level Operations Example
u1
u2
a o
a o
IN
OUT IN
add inverter into design
u1 disconnect_pin u2:o
OUT
n1
u2
IN
OUT a o
a o IN
OUT
n1
u1
u2
u3
a o
a o
a o
IN
OUT
insert_gate u3 INV_X1 IN
OUT
n1
u1
u2
u3
a o
a o
a o
IN
OUT
insert_net n2 IN
n1
u1
OUT
n2
u2
u3
IN
OUT a o
a o IN
n1
a o n2
OUT 20
Input Files – Operations (.ops) Design‐level Operations Example
u1
u2
a o
a o
IN
OUT IN
add inverter into design
u1 disconnect_pin u2:o
OUT
n1
u2
IN
OUT a o
a o IN
OUT
n1
u1
u2
u3
a o
a o
a o
IN
OUT
insert_gate u3 INV_X1 IN
OUT
n1
u1
u2
u3
a o
a o
a o
IN
OUT
insert_net n2 IN
connect_pin u2:o n2 connect_pin u3:a n2 connect_pin u3:o OUT
n1
u1
OUT
n2
u2
u3
IN
OUT a o
a o IN
n1
a o n2
OUT 21
Input Files – Operations (.ops) Timing Queries report_at report_rat report_slack
BLOCK‐BASED [RFEL] : prints arrival time at pin for [Rise/Fall][Early/Late] [RFEL] : prints required arrival time at pin for [Rise/Fall][Early/Late] [RFEL] : prints post‐CPPR slack at pin for [Rise/Fall][Early/Late]
PATH‐BASED report_worst_paths [pin name] [n] : prints top [n] paths w/worst post‐CPPR slack (i) in the design, or (ii) through [pin name]
22
Output File (.output) IN1
report_at
IN1 IN2
single value
report_rat
nand2
IN2
report_slack
a b
nor2
o n1
a b
OUT o
OUT
IN3
IN3
report_worst_paths
Path type, post‐CPPR path slack, path length, mode, path [pin RFEL] [RAT/Setup/Hold]
Operations File
[Early/Late]
Output File
23
Output File (.output) IN1
report_at
IN1 IN2
single value
report_rat
nand2
IN2
report_slack
a b
nor2
o n1
a b
OUT o
OUT
IN3
IN3
report_worst_paths
Path type, post‐CPPR path slack, path length, mode, path [pin RFEL] [RAT/Setup/Hold]
Operations File report_at –pin nor2:o -fall -early
[Early/Late]
Output File 15.85
24
Output File (.output) IN1
report_at
IN1 IN2
single value
report_rat
nand2 a b
nor2
o
a b
n1
IN2
report_slack
OUT o
OUT
IN3
IN3
report_worst_paths
Path type, post‐CPPR path slack, path length, mode, path [pin RFEL] [RAT/Setup/Hold]
Operations File
[Early/Late]
Output File
report_at –pin nor2:o -fall -early report_worst_paths –pin nand2:o –fall -late
15.85 Path 1: OUT nor2:o nor2:a nand2:o nand2:a IN1
RAT -14.7 6 L R R F F R R 25
Benchmarks: Phase 1
13 based on TAU 2013 v1.0 combinational benchmarks (~100 – ~103 gates) 10 based on TAU 2013 v1.0 sequential benchmarks (~100 – ~103 gates) Added randomized clock tree [TAU 2014] BRANCH(CLOCK, initial FF) For each remaining FF Select random location L in current tree BRANCH(L,FF) BRANCH(src,sink): create buffer chain from src to sink FF
FF
FF
L2 CLOCK L1
26
Benchmarks: Phase 1
11 based on TAU 2013 v2.0 benchmarks (~103 – ~105 gates)
Added randomized clock tree [TAU 2014] BRANCH(CLOCK, initial FF) For each remaining FF Select random location L in current tree BRANCH(L,FF) BRANCH(src,sink): create buffer chain from src to sink FF
FF
FF
L2 CLOCK L1
27
Benchmarks: Phase 2
7 based on ISPD 2013 benchmarks (~103 – ~105 gates) 1 based on Cadence benchmarks (~102 gates) 1 based on ICCAD 2014 benchmarks (~106 gates) Added randomized clock tree [TAU 2014] BRANCH(CLOCK, initial FF) For each remaining FF Select random location L in current tree BRANCH(L,FF) BRANCH(src,sink): create buffer chain from src to sink FF
FF
FF
* *clock tree has inverters and buffers
L2 CLOCK L1
28
Operations File: Phase 1 and Phase 2 RepowerGate(numGates) repower_gate
QueryTiming(0);
[numGates]
RepowerGate(1); AddBuffer(1); QueryTiming(3);
AddBuffer(numGates)* insert_gate insert_net disconnect_pin
[numGates]
*no clock tree
connect_pin
RepowerGate(10); AddBuffer(5); QueryTiming(3);
read_spef
RemoveBuffer(numGates)*
RepowerGate(20); AddBuffer(50); QueryTiming(3);
disconnect_pin connect_pin remove_gate
[numGates]
RepowerGate(1000); AddBuffer(5000); QueryTiming(3);
remove_net
QueryTiming(numPaths) report_at all PIs, POs, all internal pins† report_rat all PIs, POs, all internal pins† report_slack all PIs, POs, all internal pins† report_worst_paths [numPaths]
RepowerGate(3); AddBuffer(3); QueryTiming(3);
†verbose mode
RemoveBuffer(5); QueryTiming(3); RemoveBuffer(500); QueryTiming(0);
29
Benchmarks: Evaluation
7 based on released benchmarks (~102 – ~105 gates) 3 based on Cadence benchmarks (~102 – ~105 gates) 6 based on ICCAD 2014 benchmarks (~105 – ~106 gates) Added randomized clock tree [TAU 2014] BRANCH(CLOCK, initial FF) For each remaining FF Select random location L in current tree BRANCH(L,FF) BRANCH(src,sink): create buffer chain from src to sink
† * * † * † * † † † †
FF
FF
FF
L2
† † *clock tree has inverters and buffers †hidden benchmark (not released)
CLOCK L1
30
Operations File: Evaluation QueryTimingEval(numPins,numPaths,numFFs,numPathsFF) report_at all PIs, POs, [numPins] internal pins report_slack all PIs, POs, [numPins] internal pins report_worst_paths [numPaths] report_worst_paths –pin [numPathsFF]
[numFFs]
QueryTimingEval(10000,1000,100,10); RepowerGate(10000); AddBuffer(10000); QueryTimingEval(10000,1000,100,10); RemoveBuffer(50000); RepowerGate(100000); AddBuffer(100000); QueryTimingEval(10000,10000,100,10); 31
Evaluation Accuracy (Compared to Golden Results) PIN: all timing points in Design D PATH: paths in Design D
Block‐based Value Accuracy A(PIN)
Accuracy Score (Difference) [0,0.1] (0.1,0.5] (0.5,1.0] (1, )
ps 100 ps 80 ps 50 ps 0
Arrival time Post‐CPPR slack
Accuracy of Design A(D)
Path‐based Accuracy A(PATH)
A(D) =
A(PIN) + A(PATH) 2
Path correctness (75%) Post‐CPPR path slack (25%)
Runtime Factor (Relative) RF(D) =
MAX_R(D) – R(D) MAX_R(D) – MIN_R(D)
R(D)
Composite Design Score score(D) = A(D) (70 + 20 RF(D) + 10 MF(D))
Memory Factor (Relative) MF(D) =
MAX_M(D) – M(D) M(D) MAX_M(D) – MIN_M(D)
Overall Contestant Score is average over all design scores 32
TAU 2015 Contestants University
[12 Teams, 10 Universities, 6 Countries]
Country
Team Name
Georgia Institute of Technology
USA
GeorgiaTech
University of Illinois at Urbana‐Champaign
USA
UI‐Timer 2.0
Drexel University
USA
SAPPER
Moscow Institute of Physics and Technology
Russia
School146
University of Thessaly
Greece
U‐Thessaly
India Institute of Technology, Madras
India
Timer_IITM
India Institute of Technology, Madras
India
iitRACE
India Institute of Technology, Hyderabad
India
IIITimer
Universidade Federal do Rio Grande do Sul
Brazil
UFRGS‐Brazil
National Chiao Tung University
Taiwan
iTimerC 2.0
National Tsing Hua University
Taiwan
NN
National Tsing Hua University
Taiwan
NTU
33
Contestant Results – Accuracy Many entered, few survived – but excellent quality 12 teams down to 4 teams: top 3 team results presented Each team tested against 16 benchmarks – only 1 crash out of all 48 runs
Best
100
Average (Raw) Accuracy Score 99.8 98.8 99.3 98.4 44.1 71.2 93.4 92.1 92.7 Value Accuracy Path Accuracy
50 Worst
Final Accuracy
0 T1
T2
T3
Perfect value accuracy on 9 of 16 benchmarks Vast majority of accuracy scores 99%+
Best
100
Final Contestant Score (70% Accuracy + 20% Runtime + 10% Memory) 69.5 49.9 64.9 Accuracy
50 Worst
0
T1
T2
T3
34
Contestant Results – Performance Many entered, few survived – but excellent quality 12 teams down to 4 teams: top 3 team results presented Each team tested against 16 benchmarks – only 1 crash out of all 48 runs
Best
100
Average (Raw) Runtime and Peak Memory Usage Scores 70.3 23.3 77.1 Runtime
50 Worst
0 T1
T2
T3
Evaluation Machine: 16X Intel(R) Xeon(R) CPU E5‐2667 v2 @3.30 GHz T1 used 8 threads [>1 hour]; T2 used 1 thread [~9 hours]; T3 used 1 thread [<1 hour]
Best
100
Final Contestant Score (70% Accuracy + 20% Runtime + 10% Memory) 69.5 49.9 64.9 Accuracy
50 Worst
0
T1
T2
T3
35
Contestant Results – Performance Many entered, few survived – but excellent quality 12 teams down to 4 teams: top 3 team results presented Each team tested against 16 benchmarks – only 1 crash out of all 48 runs
Best
100
Average (Raw) Runtime and Peak Memory Usage Scores 70.3 23.3 77.1 Runtime
50 Worst
0 T1
T2
T3
Evaluation Machine: 16X Intel(R) Xeon(R) CPU E5‐2667 v2 @3.30 GHz T1 used 8 threads [>1 hour]; T2 used 1 thread [~9 hours]; T3 used 1 thread [<1 hour]
Best
100
Final Contestant Score (70% Accuracy + 20% Runtime + 10% Memory) 83.6 54.0 80.3 Accuracy Runtime
50 Worst
0
T1
T2
T3
36
Contestant Results – Performance Many entered, few survived – but excellent quality 12 teams down to 4 teams: top 3 team results presented Each team tested against 16 benchmarks – only 1 crash out of all 48 runs
Best
100
Average (Raw) Runtime and Peak Memory Usage Scores 70.3 6.3 23.3 92.3 77.1 90.8 Runtime
50 Worst
Memory Usage
0 T1
T2
T3
Evaluation Machine: 16X Intel(R) Xeon(R) CPU E5‐2667 v2 @3.30 GHz; a lot of memory leon2 benchmark: T1 used ~350 GB, T2 used ~10 GB, T3 used ~30 GB
Best
100
Final Contestant Score (70% Accuracy + 20% Runtime + 10% Memory) 84.2 63.7 89.4 Accuracy Runtime Memory Usage
50 Worst
0
T1
T2
T3
37
Acknowledgments Greg Schaeffer, Vibhor Garg [TAU 2015 Contest Committee] Debjit Sinha, Igor Keller [TAU 2015 Workshop Committee]
The TAU 2015 Contestants
This contest would not have been successful without your hard work and dedication
38
and the winners are… 39
TAU 2015 Timing Contest on Incremental Timing and CPPR Analysis
Third Place Award Presented to
Chaitanya Peddawad, Aman Goel, Dheeraj B, and Nitin Chandrachoodan For
iitRACE
IIT Madras, India Igor Keller General Chair
Debjit Sinha Technical Chair
Jin Hu Contest Chair
40
TAU 2015 Timing Contest on Incremental Timing and CPPR Analysis
Second Place Award Presented to
Tsung-Wei Huang and Martin D. F. Wong For
UI‐Timer 2.0
University of Illinois at Urbana-Champaign, USA Igor Keller General Chair
Debjit Sinha Technical Chair
Jin Hu Contest Chair
41
TAU 2015 Timing Contest on Incremental Timing and CPPR Analysis
First Place Award Presented to
Pei-Yu Lee, Cheng-Ruei Li, Wei-Lun Chiu, Yu-Ming Yang, and Iris Hui-Ru Jiang For
iTimerC 2.0
National Chiao Tung University, Taiwan Igor Keller General Chair
Debjit Sinha Technical Chair
Jin Hu Contest Chair
42
Contest Achievements and Reflections Modern (large) benchmarks and infrastructure for timing research Enables extensions for timing features (e.g., statistical, cell modeling) Support timing‐driven applications (e.g., placement and routing) Open‐source timers that have block‐based and path‐based capabilities Supports industry‐standard input formats (e.g., Verilog, Liberty) High‐performance and robust software – multi‐threaded implementations Recognition of talent Intelligent, motivated, driven (and polite!) contestants Teams capable of solving difficult problems in a few months – not easy! Open forum for directed and relevant industry problems Platform for proposing challenging problems – TAU 2016 and beyond 43