The TAU 2015 Contest

Viewer
Transcript

The TAU 2015 Contest Incremental Timing and CPPR Analysis

Jin Hu

Greg Schaeffer

Vibhor Garg

IBM Corp.

IBM Corp.

Cadence

[Speaker]

Sponsors: TAU 2015 Workshop – March 12th‐13th, 2015

1

Motivation of Incremental Analysis Performance Faster turnaround time for timing analysis in presence of design changes

Incremental Timing (Arrival Time Update Example)

i1

inp1

i2

inp2 δ inp3

o i3

n1 a b

o

n2 a b

out o

2

Motivation of Incremental Analysis Performance Faster turnaround time for timing analysis in presence of design changes

Incremental Timing (Arrival Time Update Example)

i1

inp1

i2

inp2 δ inp3

o i3

n1 a b

critical path o

pins affected n2 a b

out o

3

Motivation of Incremental Analysis Performance Faster turnaround time for timing analysis in presence of design changes

Incremental Timing (Arrival Time Update Example)

i1

inp1

i2

inp2 δ inp3

o i3

n1 a b

critical path o

pins affected n2 a b

out o

4

Motivation of Incremental Analysis Performance Faster turnaround time for timing analysis in presence of design changes

i1

inp1

i2

inp2 δ inp3

o i3

n1 a b

critical path o

pins affected n2 a b

out o

Runtime Improvement (X)

Incremental Timing (Arrival Time Update Example) 60 50

50.43

46.50

40 30 20

15.87 8.88

10 0 5

70

500

6000

# Random Operations

5

Motivation of Incremental Analysis Performance Faster turnaround time for timing analysis in presence of design changes

Incremental CPPR (Credit Computation Update Example) f1

i7

f1

inp1 n1

f2

f3

f3 out

i4

f2

inp2 i3 i1

i2

i5

i6

clk 6

Motivation of Incremental Analysis Performance Faster turnaround time for timing analysis in presence of design changes

Incremental CPPR (Credit Computation Update Example) f1

i7

f1

inp1 n1

f2

f3

f3 out

i4

f2

inp2 i3

f1

f3

f2

f3

δ1 i1 clk

i2

i5

i6

δ1 7

Motivation of Incremental Analysis Performance Faster turnaround time for timing analysis in presence of design changes

Incremental CPPR (Credit Computation Update Example) f1

i7

f1

inp1 n1

f2

f3

f3 out

i4

f2

inp2 i3

f1

f3

f2

f3

f2

f3

δ1 i1 clk

i2

i5 δ2

i6 δ2

8

Motivation of Incremental Analysis Performance Faster turnaround time for timing analysis in presence of design changes

Broadened Research Scope Enables timing‐driven applications, such as placement and routing

Theory and Implementation Requires both algorithmic intelligence and agile coding techniques Incremental CPPR (Credit Computation Update Example) f1

i7

f1

inp1 n1

f2

f3

f3 out

i4

f2

inp2 i3

f1

f3

f2

f3

f2

f3

δ1 i1 clk

i2

i5

i6 δ3

δ2

9

TAU 2015 Contest: Bigger and Better Develop a block‐based and path‐based timer that supports † Incremental Timing and Incremental CPPR Analysis

Delay and Output Slew Calculation Separate Rise/Fall Transitions Block / Gate‐level Capabilities Path‐level Capabilities Statistical / Multi‐corner Capabilities Incremental Capabilities Industry‐standard Input Formats †CPPR: process of removing inherent but artificial pessimism from timing tests and paths

10

TAU 2015 Contest Overview Provided to Contestants Detailed Documentation

Evaluation

Benchmarks Design Early and Late Design Connectivity Libraries Parasitics

Timing and CPPR tutorials, delay modeling, file formats, evaluation rules, etc.

Block‐based and Path‐based Post‐CPPR Timing Analysis

Verilog (.v) Liberty (.lib) SPEF (.spef) wrapper file (.tau2015) Assertions Operations

(.output)

Open Source Code 1. PATMOS 2011: NTU‐Timer 2. TAU 2013: IITiMer 3. TAU 2014: UI‐Timer 4. ISPD 2013: .spef/.lib parsers Previous contest winners, utilities

(.timing)

(.ops)

Based on TAU 2013, ISPD 2013, ICCAD 2014, Cadence Benchmarks

Time frame: ~4 months Contest Scope: only hold, setup, RAT tests; no latches (flush segments); single‐source clock tree

Golden Runtime Result* Accuracy

Memory Usage

Performance

*using industry timer

11

industry‐ standard formats

Input Files – Design Infrastructure Verilog (.v): design connectivity module simple

IN1

(IN1, IN2, IN3, OUT);

nand2 IN1 IN2

NAND2_X1 nand2 (.a(IN1), .b(IN2), .o(n1)); NOR2_X1 nor2 (.a(n1), .b(IN3), .o(OUT);

IN2

endmodule

IN3

a b

nor2

o n1

a b

OUT o

OUT

IN3

12

industry‐ standard formats

Input Files – Design Infrastructure Verilog (.v): design connectivity module simple

IN1

(IN1, IN2, IN3, OUT);

nand2 IN1 IN2

NAND2_X1 nand2 (.a(IN1), .b(IN2), .o(n1)); NOR2_X1 nor2 (.a(n1), .b(IN3), .o(OUT);

IN2

endmodule

IN3

a b

nor2

o n1

a b

OUT o

OUT

IN3

Liberty (.lib): gate library cell (“NAND2_X1”) { // Fall‐to‐Rise from a  o pin (“o”) { direction : output; capacitance : 0.0; timing() { cell_fall (“scalar”) { // delay values (“26.064”); } fall_transition(“scalar”) { // output slew values (“30.216”); } timing_sense : negative_unate; related_pin : “a”; } } }

13

industry‐ standard formats

Input Files – Design Infrastructure Verilog (.v): design connectivity module simple

IN1

(IN1, IN2, IN3, OUT); IN2

endmodule

IN3

cell (“NAND2_X1”) { // Fall‐to‐Rise from a  o pin (“o”) { direction : output; capacitance : 0.0; timing() { cell_fall (“scalar”) { // delay values (“26.064”); } fall_transition(“scalar”) { // output slew values (“30.216”); } timing_sense : negative_unate; related_pin : “a”; } } }

IN1 IN2

NAND2_X1 nand2 (.a(IN1), .b(IN2), .o(n1)); NOR2_X1 nor2 (.a(n1), .b(IN3), .o(OUT);

Liberty (.lib): gate library

nand2 a b

nor2

o n1

a b

OUT o

OUT

IN3

IN2 C1

R1 IN2:1 R2 C2

nand2:b C3

SPEF (.spef): design parasitics *D_NET IN2 1.6 *CONN *P IN2 I *I nand2:b I *CAP C1 IN2 0.2 C2 IN2:1 0.5 C3 nand2:b 0.9 *RES R1 IN2 IN2:1 1.4 R2 IN2:1 nand2:b 1.6 *END

14

Input Files – Assertions (.timing) Provides initial conditions for timing

IN1

Values specific to [Early/Late] x [Rise/Fall] Arrival times for each Primary Input

nand2 IN1 IN2

a b

IN2

nor2

o n1

a b

OUT o

OUT

IN3

Input slews for each Primary Input

IN3

Required arrival times for each Primary Output Asserted pin capacitances for each Primary Output Clock period for the clock source (if design is sequential)

Assertions (.timing) //                                 ER        EF        LR        LF

at at at slew slew slew rat

IN1 IN2 IN3 IN1 IN2 IN3 out

0 0 0 10 30 30 10

load

out

4

0 0 0 15 30 30 10

5 1 10 20 40 40 20

5 1 12 25 40 40 20 15

Input Files – Operations (.ops) Design‐level Operations insert_gate

GATE‐LEVEL : creates new gate of type

remove_gate

:

repower_gate :

removes existing gate changes power level of gate to type

insert_net

NET‐LEVEL : creates new net

remove_net

:

removes existing net

read_spef

:

changes or adds parasitics of net(s) in

PIN‐LEVEL disconnect_pin : decouples pin from any net connect_pin : links pin to net 16

Input Files – Operations (.ops) Design‐level Operations Example

u1

u2

a o

a o

IN

OUT IN

add inverter into design

OUT

n1

u1

u2

u3

IN

OUT a o

a o IN

n1

a o n2

OUT 17

Input Files – Operations (.ops) Design‐level Operations Example

u1

u2

a o

a o

IN

OUT IN

add inverter into design

u1 disconnect_pin u2:o

OUT

n1

u2

IN

OUT a o

a o IN

OUT

n1

u1

u2

u3

IN

OUT a o

a o IN

n1

a o n2

OUT 18

Input Files – Operations (.ops) Design‐level Operations Example

u1

u2

a o

a o

IN

OUT IN

add inverter into design

u1 disconnect_pin u2:o

OUT

n1

u2

IN

OUT a o

a o IN

OUT

n1

u1

u2

u3

a o

a o

a o

IN

OUT

insert_gate u3 INV_X1 IN

OUT

n1

u1

u2

u3

IN

OUT a o

a o IN

n1

a o n2

OUT 19

Input Files – Operations (.ops) Design‐level Operations Example

u1

u2

a o

a o

IN

OUT IN

add inverter into design

u1 disconnect_pin u2:o

OUT

n1

u2

IN

OUT a o

a o IN

OUT

n1

u1

u2

u3

a o

a o

a o

IN

OUT

insert_gate u3 INV_X1 IN

OUT

n1

u1

u2

u3

a o

a o

a o

IN

OUT

insert_net n2 IN

n1

u1

OUT

n2

u2

u3

IN

OUT a o

a o IN

n1

a o n2

OUT 20

Input Files – Operations (.ops) Design‐level Operations Example

u1

u2

a o

a o

IN

OUT IN

add inverter into design

u1 disconnect_pin u2:o

OUT

n1

u2

IN

OUT a o

a o IN

OUT

n1

u1

u2

u3

a o

a o

a o

IN

OUT

insert_gate u3 INV_X1 IN

OUT

n1

u1

u2

u3

a o

a o

a o

IN

OUT

insert_net n2 IN

connect_pin u2:o n2 connect_pin u3:a n2 connect_pin u3:o OUT

n1

u1

OUT

n2

u2

u3

IN

OUT a o

a o IN

n1

a o n2

OUT 21

Input Files – Operations (.ops) Timing Queries report_at report_rat report_slack

BLOCK‐BASED [RFEL] :  prints arrival time at pin for [Rise/Fall][Early/Late] [RFEL] :  prints required arrival time at pin for [Rise/Fall][Early/Late] [RFEL] :  prints post‐CPPR slack at pin for [Rise/Fall][Early/Late]

PATH‐BASED report_worst_paths [pin name] [n] :  prints top [n] paths w/worst post‐CPPR slack (i) in the design, or (ii) through [pin name]

22

Output File (.output) IN1

report_at

IN1 IN2

single value

report_rat

nand2

IN2

report_slack

a b

nor2

o n1

a b

OUT o

OUT

IN3

IN3

report_worst_paths

Path type, post‐CPPR path slack, path length, mode, path [pin RFEL] [RAT/Setup/Hold]

Operations File

[Early/Late]

Output File

23

Output File (.output) IN1

report_at

IN1 IN2

single value

report_rat

nand2

IN2

report_slack

a b

nor2

o n1

a b

OUT o

OUT

IN3

IN3

report_worst_paths

Path type, post‐CPPR path slack, path length, mode, path [pin RFEL] [RAT/Setup/Hold]

Operations File report_at –pin nor2:o -fall -early

[Early/Late]

Output File 15.85

24

Output File (.output) IN1

report_at

IN1 IN2

single value

report_rat

nand2 a b

nor2

o

a b

n1

IN2

report_slack

OUT o

OUT

IN3

IN3

report_worst_paths

Path type, post‐CPPR path slack, path length, mode, path [pin RFEL] [RAT/Setup/Hold]

Operations File

[Early/Late]

Output File

report_at –pin nor2:o -fall -early report_worst_paths –pin nand2:o –fall -late

15.85 Path 1: OUT nor2:o nor2:a nand2:o nand2:a IN1

RAT -14.7 6 L R R F F R R 25

Benchmarks: Phase 1

13 based on TAU 2013 v1.0 combinational benchmarks (~100 – ~103 gates) 10 based on TAU 2013 v1.0 sequential benchmarks (~100 – ~103 gates) Added randomized clock tree [TAU 2014] BRANCH(CLOCK, initial FF) For each remaining FF Select random location L in current tree BRANCH(L,FF) BRANCH(src,sink): create buffer chain from src to sink FF

FF

FF

L2 CLOCK L1

26

Benchmarks: Phase 1

11 based on TAU 2013 v2.0 benchmarks (~103 – ~105 gates)

Added randomized clock tree [TAU 2014] BRANCH(CLOCK, initial FF) For each remaining FF Select random location L in current tree BRANCH(L,FF) BRANCH(src,sink): create buffer chain from src to sink FF

FF

FF

L2 CLOCK L1

27

Benchmarks: Phase 2

7 based on ISPD 2013 benchmarks (~103 – ~105 gates) 1 based on Cadence benchmarks (~102 gates) 1 based on ICCAD 2014 benchmarks (~106 gates) Added randomized clock tree [TAU 2014] BRANCH(CLOCK, initial FF) For each remaining FF Select random location L in current tree BRANCH(L,FF) BRANCH(src,sink): create buffer chain from src to sink FF

FF

FF

* *clock tree has inverters and buffers

L2 CLOCK L1

28

Operations File: Phase 1 and Phase 2 RepowerGate(numGates) repower_gate

QueryTiming(0);

[numGates]

RepowerGate(1); AddBuffer(1); QueryTiming(3);

AddBuffer(numGates)* insert_gate insert_net disconnect_pin

[numGates]

*no clock tree

connect_pin

RepowerGate(10); AddBuffer(5); QueryTiming(3);

read_spef

RemoveBuffer(numGates)*

RepowerGate(20); AddBuffer(50); QueryTiming(3);

disconnect_pin connect_pin remove_gate

[numGates]

RepowerGate(1000); AddBuffer(5000); QueryTiming(3);

remove_net

QueryTiming(numPaths) report_at all PIs, POs, all internal pins† report_rat all PIs, POs, all internal pins† report_slack all PIs, POs, all internal pins† report_worst_paths [numPaths]

RepowerGate(3); AddBuffer(3); QueryTiming(3);

†verbose mode

RemoveBuffer(5); QueryTiming(3); RemoveBuffer(500); QueryTiming(0);

29

Benchmarks: Evaluation

7 based on released benchmarks (~102 – ~105 gates) 3 based on Cadence benchmarks (~102 – ~105 gates) 6 based on ICCAD 2014 benchmarks (~105 – ~106 gates) Added randomized clock tree [TAU 2014] BRANCH(CLOCK, initial FF) For each remaining FF Select random location L in current tree BRANCH(L,FF) BRANCH(src,sink): create buffer chain from src to sink

† * * † * † * † † † †

FF

FF

FF

L2

† † *clock tree has inverters and buffers †hidden benchmark (not released)

CLOCK L1

30

Operations File: Evaluation QueryTimingEval(numPins,numPaths,numFFs,numPathsFF) report_at all PIs, POs, [numPins] internal pins report_slack all PIs, POs, [numPins] internal pins report_worst_paths [numPaths] report_worst_paths –pin [numPathsFF]

[numFFs]

QueryTimingEval(10000,1000,100,10); RepowerGate(10000); AddBuffer(10000); QueryTimingEval(10000,1000,100,10); RemoveBuffer(50000); RepowerGate(100000); AddBuffer(100000); QueryTimingEval(10000,10000,100,10); 31

Evaluation Accuracy (Compared to Golden Results) PIN: all timing points in Design D PATH: paths in Design D

Block‐based Value Accuracy A(PIN)

Accuracy Score (Difference) [0,0.1] (0.1,0.5] (0.5,1.0] (1, )

ps 100 ps 80 ps 50 ps 0

Arrival time Post‐CPPR slack

Accuracy of Design A(D)

Path‐based Accuracy A(PATH)

A(D) =

A(PIN) + A(PATH) 2

Path correctness (75%) Post‐CPPR path slack (25%)

Runtime Factor (Relative) RF(D) =

MAX_R(D) – R(D) MAX_R(D) – MIN_R(D)

R(D)

Composite Design Score score(D) = A(D)   (70 + 20   RF(D) + 10   MF(D))

Memory Factor (Relative) MF(D) =

MAX_M(D) – M(D) M(D) MAX_M(D) – MIN_M(D)

Overall Contestant Score is average over all design scores 32

TAU 2015 Contestants University

[12 Teams, 10 Universities, 6 Countries]

Country

Team Name

Georgia Institute of Technology

USA

GeorgiaTech

University of Illinois at Urbana‐Champaign

USA

UI‐Timer 2.0

Drexel University

USA

SAPPER

Moscow Institute of Physics and Technology

Russia

School146

University of Thessaly

Greece

U‐Thessaly

India Institute of Technology, Madras

India

Timer_IITM

India Institute of Technology, Madras

India

iitRACE

India Institute of Technology, Hyderabad

India

IIITimer

Universidade Federal do Rio Grande do Sul

Brazil

UFRGS‐Brazil

National Chiao Tung University

Taiwan

iTimerC 2.0

National Tsing Hua University

Taiwan

NN

National Tsing Hua University

Taiwan

NTU

33

Contestant Results – Accuracy Many entered, few survived – but excellent quality 12 teams down to 4 teams: top 3 team results presented Each team tested against 16 benchmarks – only 1 crash out of all 48 runs

Best

100

Average (Raw) Accuracy Score 99.8 98.8 99.3 98.4 44.1 71.2 93.4 92.1 92.7 Value Accuracy Path Accuracy

50 Worst

Final Accuracy

0 T1

T2

T3

Perfect value accuracy on 9 of 16 benchmarks Vast majority of accuracy scores 99%+

Best

100

Final Contestant Score (70% Accuracy + 20% Runtime + 10% Memory) 69.5 49.9 64.9 Accuracy

50 Worst

0

T1

T2

T3

34

Contestant Results – Performance Many entered, few survived – but excellent quality 12 teams down to 4 teams: top 3 team results presented Each team tested against 16 benchmarks – only 1 crash out of all 48 runs

Best

100

Average (Raw) Runtime and Peak Memory Usage Scores 70.3 23.3 77.1 Runtime

50 Worst

0 T1

T2

T3

Evaluation Machine: 16X Intel(R) Xeon(R) CPU E5‐2667 v2 @3.30 GHz T1 used 8 threads [>1 hour]; T2 used 1 thread [~9 hours]; T3 used 1 thread [<1 hour]

Best

100

Final Contestant Score (70% Accuracy + 20% Runtime + 10% Memory) 69.5 49.9 64.9 Accuracy

50 Worst

0

T1

T2

T3

35

Contestant Results – Performance Many entered, few survived – but excellent quality 12 teams down to 4 teams: top 3 team results presented Each team tested against 16 benchmarks – only 1 crash out of all 48 runs

Best

100

Average (Raw) Runtime and Peak Memory Usage Scores 70.3 23.3 77.1 Runtime

50 Worst

0 T1

T2

T3

Evaluation Machine: 16X Intel(R) Xeon(R) CPU E5‐2667 v2 @3.30 GHz T1 used 8 threads [>1 hour]; T2 used 1 thread [~9 hours]; T3 used 1 thread [<1 hour]

Best

100

Final Contestant Score (70% Accuracy + 20% Runtime + 10% Memory) 83.6 54.0 80.3 Accuracy Runtime

50 Worst

0

T1

T2

T3

36

Contestant Results – Performance Many entered, few survived – but excellent quality 12 teams down to 4 teams: top 3 team results presented Each team tested against 16 benchmarks – only 1 crash out of all 48 runs

Best

100

Average (Raw) Runtime and Peak Memory Usage Scores 70.3 6.3 23.3 92.3 77.1 90.8 Runtime

50 Worst

Memory Usage

0 T1

T2

T3

Evaluation Machine: 16X Intel(R) Xeon(R) CPU E5‐2667 v2 @3.30 GHz; a lot of memory leon2 benchmark: T1 used ~350 GB, T2 used ~10 GB, T3 used ~30 GB

Best

100

Final Contestant Score (70% Accuracy + 20% Runtime + 10% Memory) 84.2 63.7 89.4 Accuracy Runtime Memory Usage

50 Worst

0

T1

T2

T3

37

Acknowledgments Greg Schaeffer, Vibhor Garg [TAU 2015 Contest Committee] Debjit Sinha, Igor Keller [TAU 2015 Workshop Committee]

The TAU 2015 Contestants

This contest would not have been successful without your hard work and dedication

38

and the winners are… 39

TAU 2015 Timing Contest on Incremental Timing and CPPR Analysis

Third Place Award Presented to

Chaitanya Peddawad, Aman Goel, Dheeraj B, and Nitin Chandrachoodan For

iitRACE

IIT Madras, India Igor Keller General Chair

Debjit Sinha Technical Chair

Jin Hu Contest Chair

40

TAU 2015 Timing Contest on Incremental Timing and CPPR Analysis

Second Place Award Presented to

Tsung-Wei Huang and Martin D. F. Wong For

UI‐Timer 2.0

University of Illinois at Urbana-Champaign, USA Igor Keller General Chair

Debjit Sinha Technical Chair

Jin Hu Contest Chair

41

TAU 2015 Timing Contest on Incremental Timing and CPPR Analysis

First Place Award Presented to

Pei-Yu Lee, Cheng-Ruei Li, Wei-Lun Chiu, Yu-Ming Yang, and Iris Hui-Ru Jiang For

iTimerC 2.0

National Chiao Tung University, Taiwan Igor Keller General Chair

Debjit Sinha Technical Chair

Jin Hu Contest Chair

42

Contest Achievements and Reflections Modern (large) benchmarks and infrastructure for timing research Enables extensions for timing features (e.g., statistical, cell modeling) Support timing‐driven applications (e.g., placement and routing) Open‐source timers that have block‐based and path‐based capabilities Supports industry‐standard input formats (e.g., Verilog, Liberty) High‐performance and robust software – multi‐threaded implementations Recognition of talent Intelligent, motivated, driven (and polite!) contestants Teams capable of solving difficult problems in a few months – not easy! Open forum for directed and relevant industry problems Platform for proposing challenging problems – TAU 2016 and beyond 43

Mar 13, 2015 - report_worst_paths [pin name] [n] : prints top [n] paths w/worst post-CPPR slack. (i) in the design, or. (ii) through [pin name]. Input Files â Operations (.ops). Timing Queries. PATH-BASED. BLOCK-BASED report_at. [RFEL] : prints arrival time at pin for [Rise/Fall][Early/Late] report_rat.

Download PDF

759KB Sizes 3 Downloads 226 Views

Report

The TAU 2015 Contest

Recommend Documents