You are on page 1of 37

iitRACE: A Memory Efficient Engine for Fast Incremental

Timing Analysis and Clock Pessimism Removal

Chaitanya Peddawad, Aman Goel, Dheeraj B, Nitin Chandrachoodan

Department of Electrical Engineering


Indian Institute of Technology Madras, India

2015 IEEE/ACM International Conference on Computer Aided Design

Peddawad et al. iitRACE 1 / 37


Outline

1 Recap
Introduction: STA, Incremental Timing and CPPR
Problem Formulation
2 Algorithm
Incremental Timing: Identifying Incremental Cones and Resolving
Dependencies
Block-based topologically guided CPPR and Path Extraction Using
Dynamic Path Reduction
3 Experimental Results
Accuracy and Memory Efficiency
Test Coverage and Pin Coverage
Challenges & Improvements
4 Conclusion

Peddawad et al. iitRACE 2 / 37


Introduction
STA, Incremental Timing, CPPR
Faster turnaround time for timing analysis in presence of design
changes
Clock network as a source of pessimism: Need to update
pessimism-free timing information (CPPR) incrementally

Figure: Example for CPPR and incremental changes to the design


Peddawad et al. iitRACE 3 / 37
Introduction
Problem Formulation

Given a circuit in standard file formats (.v , .lib , .spef , .timing)


The task is to perform incremental changes to the circuit (specified in
.ops) and perform timing analysis & CPPR in the affected regions using
least time and resources

Peddawad et al. iitRACE 4 / 37


Algorithm
Flow Chart

Perform Incremental Changes

Identify Incremental Cones


Incremental Timing : Resolve Dependencies
Pre-CPPR Timing Propagation

Back Traversal
Find Credit
Block Based CPPR :
Front Traversal
Build NegPinList

Path Extraction

Post-CPPR Timing Propagation


Peddawad et al. iitRACE 5 / 37
Algorithm
Incremental Timing: Identifying Incremental Cones and Resolving Dependencies

Cone-end Points
Every cone in the circuit can be associated with a unique primary output
or flip-flop input pin, henceforth referred to as Cone-end point (CEP)

inp1 a
inp2 a
z D Q az az b z out1
b
FF1 u1 u3 u5
u2 buf1
az u7
az a
D Q b z D Q

FF2 u4 FF3

clk az az az
inv1 buf2 buf3

Figure: Cone-end Points: out1, FF3:D

Peddawad et al. iitRACE 6 / 37


Algorithm
Incremental Timing: Identifying Incremental Cones and Resolving Dependencies

Identifying Incremental CEPs

inp1 a
inp2 a
z D Q az az b z out1
b
FF1 u1 u3 u5
u2 buf1
az u7
az a
D Q b z D Q

FF2 u4 FF3

clk az az az
inv1 buf2 buf3
Figure: Adding a gate to the circuit

Peddawad et al. iitRACE 7 / 37


Algorithm
Incremental Timing: Identifying Incremental Cones and Resolving Dependencies

Identifying Incremental CEPs

inp1 a
inp2 a
z D Q az az b z out1
b
FF1 u1 u3 u5
u2 buf1
az u7
az a
D Q b z D Q

FF2 u4 FF3

clk az az az
inv1 buf2 buf3
Figure: Adding a gate to the circuit: Disconnect net from u4:z

Peddawad et al. iitRACE 8 / 37


Algorithm
Incremental Timing: Identifying Incremental Cones and Resolving Dependencies

Identifying Incremental CEPs

inp1 a
inp2 a
z D Q az az b z out1
b
FF1 u1 u3 u5
u2 buf1
az u7
az a
D Q b z az D Q

FF2 u4 u6 FF3

clk az az az
inv1 buf2 buf3
Figure: Adding a gate to the circuit: insert u6

Peddawad et al. iitRACE 9 / 37


Algorithm
Incremental Timing: Identifying Incremental Cones and Resolving Dependencies

Identifying Incremental Nets

inp1 a
inp2 a
z D Q az az b z out1
b
FF1 u1 u3 u5
u2 buf1
az u7
az 1 a 3
D Q 2 b z az D Q

FF2 u4 u6 FF3

clk az az az
inv1 buf2 buf3
Figure: Adding a gate to the circuit: insert net 3 & connect net 3 to u4:z

Peddawad et al. iitRACE 10 / 37


Algorithm
Incremental Timing: Identifying Incremental Cones and Resolving Dependencies

Identifying Incremental Nets


A net and associated timing information at its i/o pins may be
dependent on the parameters of another incremental net
Updating the values is only possible once we resolve the
dependencies between incremental nets
original incremental change
a a
b z b z az
u6 ICC
XOR2_X1 XOR2_X1
inp1 a
inp2 a
z D Q az az b z out1
b
FF1 u1 u3 u5
u2 buf1
az u7
az 1 a
2 3
D Q b z az D Q

FF2 u4 u6 FF3

clk az az az
inv1 buf2 buf3

Figure: Incremental Nets


Peddawad et al. iitRACE 11 / 37
Algorithm
Incremental Timing: Identifying Incremental Cones and Resolving Dependencies

Resolving Dependencies
Find a set of incrementally affected & independent nets: Based on a
modified version of Breadth-First Search Algorithm
Identify net 1 & 2 as independent nets & FF3:D as incremental CEP
Cone of FF3:D is hence an incremental cone of change (ICC)
original incremental change
a a
b z b z az
u6 ICC
XOR2_X1 XOR2_X1
inp1 a
inp2 a
z D Q az az b z out1
b
FF1 u1 u3 u5
u2 buf1
az u7
az 1 a
2 3
D Q b z az D Q

FF2 u4 u6 FF3

clk az az az
inv1 buf2 buf3

Figure: Nets 1, 2, 3: Incremental Nets 1 & 2: Dependencies resolved


Peddawad et al. iitRACE 12 / 37
Algorithm
Incremental Timing: Incremental AT/RAT/Slack Update

Pre-CPPR Timing Propagation in ICC


Update AT by single block-based front traversal: start with net 1 & 2
Update RAT/Slack by back traversal from incremental CEPs (FF3:D)
Static run (full circuit) vs incremental run (only ICC)

original incremental change


a a
b z b z az
u6 ICC
XOR2_X1 XOR2_X1
inp1 a
inp2 a
z D Q az az b z out1
b
FF1 u1 u3 u5
u2 buf1
az u7
az 1 a
2 3
D Q b z az D Q

FF2 u4 u6 FF3

clk az az az
inv1 buf2 buf3

Figure: Timing propagation in ICC


Peddawad et al. iitRACE 13 / 37
Algorithm
Block-based topologically guided CPPR

Step 1 - Back Traversal


Block based levelised back traversal from a CEP till a FF or PI
Concept of criticalAT & criticalRAT
Setting RAT (pre-CPPR) and criticalRAT at pins encountered and
marking the cone
inp1 a
b z
u2
D Q a a
az D Q
FF1 b z b z
ck u3 c FF3
buf1 u1 ck
u5
az
a
D Q b z az out1
FF2 u4 u6
ck

cp23
clk az az az
inv1 cp13 buf2 buf3

Figure: CPPR Algorithm - Back traversal from FF3:D

Peddawad et al. iitRACE 14 / 37


Algorithm
Block-based topologically guided CPPR

Step 1 - Back Traversal


Block based levelised back traversal from a CEP till a FF or PI
Concept of criticalAT & criticalRAT
Setting RAT (pre-CPPR) and criticalRAT at pins encountered and
marking the cone
inp1 a
b z
u2
D Q a a
az D Q
FF1 b z b z
ck u3 c FF3
buf1 u1 ck
u5
az
a
D Q b z az out1
FF2 u4 u6
ck

cp23
clk az az az
inv1 cp13 buf2 buf3

Figure: CPPR Algorithm - Back traversal from FF3:D

Peddawad et al. iitRACE 15 / 37


Algorithm
Block-based topologically guided CPPR

Step 2 - Identifying Common Points & Finding Credits


Identifying common point of data path and clock path for each pair
of launching and capturing FFs: cp13, cp23

inp1 a
b z
u2
D Q a a
az D Q
FF1 b z b z
ck u3 c FF3
buf1 u1 ck
u5
az
a
D Q b z az out1
FF2 u4 u6
ck

cp23
clk az az az
inv1cp13 buf2 buf3

Figure: CPPR Algorithm - Identifying Common Points

Peddawad et al. iitRACE 16 / 37


Algorithm
Block-based topologically guided CPPR

Step 2 - Identifying Common Points & Finding Credits


Credit at a launching FF can be found using eqn -
credit hold L
= atcp E
− atcp
credit setup = atcp
L E
− atcp L
− (atclk src
E
− atclk src )

inp1 a
b z
u2
D Q a a
az D Q
FF1 b z b z
ck u3 c FF3
buf1 u1 ck
u5
az
a
D Q b z az out1
FF2 u4 u6
ck

cp23
clk az az az
inv1cp13 buf2 buf3

Figure: CPPR Algorithm - Finding Credits

Peddawad et al. iitRACE 17 / 37


Algorithm
Block-based topologically guided CPPR

Step 3 - Updating fakeAT


fakeAT: Adjust AT values to carry credit information at a pin
L(E ) L(E )
fake atFF :Q = atFF :Q ∓ credit L(E )

inp1 a
b z
u2
D Q a a
az D Q
FF1 b z b z
ck u3 c FF3
buf1 u1 ck
u5
az
a
D Q b z az out1
FF2 u4 u6
ck

cp23
clk az az az
inv1 cp13 buf2 buf3

Figure: CPPR Algorithm - Setting fakeAT at output of launching FFs

Peddawad et al. iitRACE 18 / 37


Algorithm
Block-based topologically guided CPPR

Step 4 - Front Traversal


Block-based levelised front traversal within the colored cone
Propagate fakeAT with setting criticalAT
fakeAT propagation ensures propagation of worst post-CPPR slack

inp1 a
b z
u2
D Q a a
z az b z D Q
FF1 b FF3
ck u3 c
buf1 u1 ck
u5
az
a
z az out1
D Q b
FF2 u4 u6
ck

cp23
clk az az az
inv1 cp13 buf2 buf3

Figure: CPPR Algorithm - Front Traversal

Peddawad et al. iitRACE 19 / 37


Algorithm
Block-based topologically guided CPPR

Step 4.1 - Building NegPinList During Front Traversal


Find the updated slacks using fakeAT and RAT values
NegPinList & Global Path Table (GPT): Initially empty !
Add failing pins to NegPinList

NegPinList inp1 a
b z
Pins Slack
u2
u1:aL -33 a
D Q a
u1:bL -25 z az b z D Q
FF1 b FF3
u3 c
u4:bL -15 buf1
ck
u1 ck
u5
az
a
z az out1
D Q b
FF2 u4 u6
ck

cp23
clk az az az
inv1 cp13 buf2 buf3

Figure: CPPR Algorithm - Building NegPinList

Peddawad et al. iitRACE 20 / 37


Algorithm
Block-based topologically guided CPPR

Step 4.1 - Building NegPinList During Front Traversal


Find the updated slacks using fakeAT and RAT values
NegPinList & Global Path Table (GPT): Initially empty !
Add failing pins to NegPinList

NegPinList inp1 a
b z
Pins Slack
u2
u1:aL -33 a
D Q a
u1:bL -25 z az b z D Q
FF1 b FF3
u3 c
u4:bL -15 buf1
ck
u1 ck
u5
u2:bL -28 az
u2:aE -13 a z
D Q az out1
u3:aL -23 b
FF2 u4 u6
u4:aL -33 ck
u5:aL -28
u5:aE -13 clk az az
cp23
az
u5:bL -23 inv1 cp13 buf2 buf3
u5:cL -33
FF3:DL -33 Figure: CPPR Algorithm - Building NegPinList
FF3:DE -13

Peddawad et al. iitRACE 21 / 37


Algorithm
Path Extraction from NegPinList of a cone

Step 5 - Extract Paths from NegPinList


inp1 a
b z
u2
D Q a a
az D Q
FF1 b z b z
ck u3 c FF3
buf1 u1 ck
u5
az
a
D Q b z az out1
FF2 u4 u6
ck

cp23
clk az az az
inv1 cp13 buf2 buf3

Step Path NegPinList


FF3:DL , u5:cL , u4:aL , u1:aL , u5:aL , u2:bL , u1:bL ,
0 - u5:bL , u3:aL , u4:bL , FF3:DE , u5:aE , u2:aE

Peddawad et al. iitRACE 22 / 37


Algorithm
Path Extraction from NegPinList of a cone

Step 5 - Extract Paths from NegPinList


inp1 a
b z
u2
D Q a a
az D Q
FF1 b z b z
ck u3 c FF3
buf1 u1 ck
u5
az
a
D Q b z az out1
FF2 u4 u6
ck

cp23
clk az az az
inv1 cp13 buf2 buf3

Step Path NegPinList


FF3:DL , u5:cL , u4:aL , u1:aL , u5:aL , u2:bL , u1:bL ,
1 P1 u5:bL , u3:aL , u4:bL , FF3:DE , u5:aE , u2:aE

Peddawad et al. iitRACE 23 / 37


Algorithm
Path Extraction from NegPinList of a cone

Step 5 - Extract Paths from NegPinList


inp1 a
b z
u2
D Q a a
az D Q
FF1 b z b z
ck u3 c FF3
buf1 u1 ck
u5
az
a
D Q b z az out1
FF2 u4 u6
ck

cp23
clk az az az
inv1 cp13 buf2 buf3

Step Path NegPinList


FF3:DL , u5:cL , u4:aL , u1:aL , u5:aL , u2:bL , u1:bL ,
2 P2 u5:bL , u3:aL , u4:bL , FF3:DE , u5:aE , u2:aE

Peddawad et al. iitRACE 24 / 37


Algorithm
Path Extraction from NegPinList of a cone

Step 5 - Extract Paths from NegPinList


inp1 a
b z
u2
D Q a a
z az b z D Q
FF1 b FF3
ck u3 c
buf1 u1 ck
u5
az
a
D Q b z az out1
FF2 u4 u6
ck

cp23
clk az az az
inv1 cp13 buf2 buf3

Step Path NegPinList


FF3:DL , u5:cL , u4:aL , u1:aL , u5:aL , u2:bL , u1:bL ,
3 P3 u5:bL , u3:aL , u4:bL , FF3:DE , u5:aE , u2:aE

Peddawad et al. iitRACE 25 / 37


Algorithm
Path Extraction from NegPinList of a cone

Step 5 - Extract Paths from NegPinList


inp1 a
b z
u2
D Q a a
az D Q
FF1 b z b z
ck u3 c FF3
buf1 u1 ck
u5
az
a
D Q b z az out1
FF2 u4 u6
ck

cp23
clk az az az
inv1 cp13 buf2 buf3

Step Path NegPinList


FF3:DL , u5:cL , u4:aL , u1:aL , u5:aL , u2:bL , u1:bL ,
4 P4 u5:bL , u3:aL , u4:bL , FF3:DE , u5:aE , u2:aE

Peddawad et al. iitRACE 26 / 37


Algorithm
Path Extraction from NegPinList of a cone

Step 5 - Extract Paths from NegPinList


inp1 a
b z
u2
D Q a a
az D Q
FF1 b z b z
ck u3 c FF3
buf1 u1 ck
u5
az
a
z az out1
D Q b
FF2 u4 u6
ck

cp23
clk az az az
inv1 cp13 buf2 buf3

Step Path NegPinList


FF3:DL , u5:cL , u4:aL , u1:aL , u5:aL , u2:bL , u1:bL ,
5 P5 u5:bL , u3:aL , u4:bL , FF3:DE , u5:aE , u2:aE

Peddawad et al. iitRACE 27 / 37


Algorithm
Path Extraction from NegPinList of a cone

Step 5 - Extract Paths from NegPinList


inp1 a
b z
u2
D Q a a
az D Q
FF1 b z b z
ck u3 c FF3
buf1 u1 ck
u5
az
a
D Q b z az out1
FF2 u4 u6
ck

cp23
clk az az az
inv1 cp13 buf2 buf3

Step Path NegPinList


FF3:DL , u5:cL , u4:aL , u1:aL , u5:aL , u2:bL , u1:bL ,
6 P6 u5:bL , u3:aL , u4:bL , FF3:DE , u5:aE , u2:aE

Peddawad et al. iitRACE 28 / 37


Algorithm
Path Extraction from NegPinList of a cone

Paths Skipped

Name Slack Mode Path


FF2:Q→u1:b→u1:z→u2:b→
P7 -20 L u2:z→ u5:a→u5:z→FF3:D
FF2:Q→u1:b→u1:z→u3:a→
P8 -15 L u3:z→ u5:b→u5:z→FF3:D

inp1 a
b z P7
u2
D Q a a
az D Q
FF1 b z b z
ck u3 P8 c FF3
buf1 u1 ck
u5
az
a
D Q b z az out1
FF2 u4 u6
ck

cp23
clk az az az
inv1 cp13 buf2 buf3

Figure: Path Extraction - Paths skipped


Peddawad et al. iitRACE 29 / 37
Algorithm
Path Extraction from NegPinList of a cone

Paths Extraction: Redundant Paths


None of the pins in path P7 (or P8) have P7 (P8) as worst path
through them in the cone under consideration
It is highly probable that correcting only the reported paths (P1 to
P6) would correct the skipped paths (P7 and P8) as well
inp1 a
b z P7
u2
D Q a a
az D Q
FF1 b z b z
ck u3 P8 c FF3
buf1 u1 ck
u5
az
a
D Q b z az out1
FF2 u4 u6
ck

cp23
clk az az az
inv1 cp13 buf2 buf3

Figure: Path Extraction - Paths skipped


Peddawad et al. iitRACE 30 / 37
Algorithm
Path Extraction from NegPinList of a cone

Paths Extraction: Redundant Paths


Most importantly, our proposed algorithm ensures that for every path that
is reported, this path is the most critical for some pin in the path, for
some logic cone in the circuit. This is not ensured by regular algorithms
that report the N worst paths in a circuit, due to which such algorithms
typically report many paths that are in some sense redundant
inp1 a
b z P7
u2
D Q a a
az D Q
FF1 b z b z
ck u3 P8 c FF3
buf1 u1 ck
u5
az
a
D Q b z az out1
FF2 u4 u6
ck

cp23
clk az az az
inv1 cp13 buf2 buf3

Figure: Path Extraction - Paths skipped


Peddawad et al. iitRACE 31 / 37
Experimental Results
Accuracy and Memory Efficiency

TAU Results: Comparison of iitRACE With Other Academic Timers


Average value accuracy 99% with the least memory requirement !
(Average 2X lower than the first place timer)

50 Memory (GB) vs #Gates


iitRACE
45 UI-Timer 2.0
40 iTimerC 2.0

35
Memory peaks: corner cases
Memory (GB)

30
25 On the fly interconnect delay
20 computation
15
10 Pin slack, criticalAT,
5 criticalRAT as the only implicit
0
representation of path
37 5
38 15

12 9.1
45 8
8

13 .9K
14 .5K
16 .6K
17 K
25 .3K

14 .8K
16 6.7K
.4K
25 K
13 3K
8

15
47

1.7

5.3
35
7

.
8
9
7

47

16
9

#Gates

Figure: Memory Usage Comparison


Peddawad et al. iitRACE 32 / 37
Experimental Results
Test Coverage

Coverage
A measure of the number of unique CEPs among the pins in the set
of worst paths
Higher coverage: Our algorithm typically captures a much larger
number of such CEPs than the actual N worst paths in the circuit
Beneficial in identifying all the failing cones

Peddawad et al. iitRACE 33 / 37


Experimental Results
Test Coverage

#Unique CEPs vs #Paths


80
b19 80
cordic
70 iitRACE 70 iitRACE

# Unique CEPs

# Unique CEPs
60 iTimerC 2.0 60 iTimerC 2.0
50 50
40 40
30 30
20 20
10 10
0 0
10 50 100 500 1K 5K 10K 10 50 100 500 1K 5K 10K
Path Count Path Count

300
des_perf 80
mgc_edit_dist
250 iitRACE 70 iitRACE
# Unique CEPs

# Unique CEPs
iTimerC 2.0 60 iTimerC 2.0
200 50
150 40
100 30
20
50 10
0 0
10 50 100 500 1K 5K 10K 10 50 100 500 1K 5K 10K
Path Count Path Count

Figure: Test coverage comparison against actual top N worst paths

Higher coverage of pins: aid to identify critical regions


Peddawad et al. iitRACE 34 / 37
Experimental Results
Post-contest Improvements in Performance Without Compromising the Accuracy

Post-Contest Speed-up: 10X !

MMR (GB) CPU (s)


Benchmark C Post-C C Post-C Resolved the memory peaks:
b19 3.02 3.33 426 132
cordic 0.87 0.84 60 31
corner cases
des perf 4.19 1.74 189 94
edit dist 1.98 2.16 562 84 Cut-off technique: search space
fft 2.38 0.63 44 26
leon2 9.92 12.4 13800 582
reduction
leon3mp 8.20 10.17 4920 463
mgc edit dist 1.82 2.14 566 79 Algorithmic optimization: sparse
mgc matrix mult 2.01 2.37 239 82
netcard 9.33 11.6 3800 516 table implementation for finding
tau cordic core 0.27 0.21 8 7
tau crc32d16N 0.11 0.11 1 1
lowest common ancestor
tau softusb navre
tau tip master
0.19
0.63
0.2
0.65
13
39
7
18
(common point), improvement
vga lcd 1 13.22 2.76 742 409 in incremental circuit connection
vga lcd 2 1.54 1.76 243 64
Total 59.68 53.07 25680 2590
Multithreading: parallel
MMR: Maximum Memory Requirement, CPU: Runtime (s)
C: Contest, Post-C: Post-Contest processing of the cones

Peddawad et al. iitRACE 35 / 37


Conclusion

Proposed a novel memory efficient incremental timing analysis


technique with block-based CPPR framework
The approach is rooted from a highly practical perspective, in which
we accurately report only non-redundant critical paths
Significantly higher coverage of cone-end points corresponding to
critical paths than regular algorithms for worst path reporting. This
can be used by the designers as an additional aid to identify critical
areas in the circuit from a path correction perspective
Future extensions: static/incremental statistical timing analysis

Peddawad et al. iitRACE 36 / 37


Acknowledgements

TAU 2015 Contest Organizers


- Jin Hu, IBM Corp.
- Greg Schaeffer, IBM Corp.
- Vibhor Garg, Cadence
We would also like to thank the authors of UI-Timer 2.0 and iTimerC
2.0 for sharing their timer binaries of TAU 2015 Contest

Peddawad et al. iitRACE 37 / 37

You might also like