Professional Documents
Culture Documents
I. I NTRODUCTION
The driving force behind the rapid growth of the VLSI technology has been the constant reduction of the feature size of VLSI devices. The feature size decreased from about 2m in 1985 to 0.350.5m today (1996). Such continual miniaturization of VLSI devices
has strong impact on the VLSI technology in several ways. First, the
device density increases rapidly the total number of transistors on
a single VLSI chip has increased from less than 500,000 in 1985 to
over 10 million today. Second, the interconnect delay becomes much
more significant. According to the simple scaling rule described in [1],
when the devices and interconnects are scaled down in all three dimensions by a factor of S, the intrinsic gate delay is reduced by a factor of
S, the delay of local interconnects (such as connections between adjacent gates) remains the same, but the delay of global interconnects
increases by a factor of S2. As a result, the interconnect delay has become the dominating factor in determining system performance. In
many systems designed today, as much as 50% to 70% of clock cycle
are consumed by interconnect delays. This percentage will continue
to rise as the feature size decreases further.
Not only do interconnects become more important, they also become much more difficult to model and optimize in the deep submicron VLSI technology, as the distributed nature of the interconnects
has to be considered. For the conventional technology with the feature
size of 1m or above, the interconnect resistance in most cases is negligible compared to the driver resistance. In this case, the interconnect
delay is determined by the driver resistance times the total interconnect and loading capacitance. Therefore, conventional optimization
techniques focus on reducing the driver resistance using driver, gate,
and transistor sizing, and minimizing the interconnect capacitance by
minimum-length, minimum-width routing. In the deep submicron design technology, however, the interconnect resistance is comparable to
the driver resistance in many long signal nets. Therefore, the intercon-
II. D EVICE
AND I NTERCONNECT
D ELAY M ODELS
t s0 i
k Path s0 i
Rk Cap k
(1)
H s
h t e
st
dt
i 0
1 i i
s
i!
t i h t dt
(2)
of the circuit. In practice, q 5 is commonly used. In general, however, it is not possible to represent the poles and residues in H s explicitly in terms of design parameters of the interconnect in a closedform expression, which makes the AWE method difficult to use for
interconnect optimization directly.1 Therefore, the researchers have
recently focused on the simple case of q 2, known as the two-pole
model, to approximate the delay explicitly using the first three moments m0 (which is normalized), m1 , and m2. The reader may refer
to [6] for more detailed discussions. Note that the two-pole model can
capture the basic inductive effect. The expressions used in the twopole model are usually much more complex than the Elmore delay
model. However, it is possible to use the two-pole model for interconnect optimization with similar degree of efficiency but higher accuracy
than the Elmore delay model. Other delay metrics based on higher order moments, such as the central moments and the explicit RC delay
using the first three moments, are summarized in [6].
1
i!
t i h t dt
(3)
H s
i 1
ki
pi
(4)
h t
k i e pi t
(5)
i 1
moments mi of H s to those of H s [27]. The choice of order q depends on the accuracy required but is always much less than the order
tf
k n L
min d VDD
(6)
tf 2
tt
6
Vthn
1 2V
DD
(7)
where tt is the input transition time (more specifically, the input rise
time in this case) and Vthn is the threshold voltage of n-transistor. Since
this model provides a closed-form expression, it has also been widely
used for device layout optimization.
1 Sensitivity-based methods have been proposed to use AWE for fast timing
analysis to greedily guide the optimization process to a local optima.
The values of C1 , C2 and R in a -model (see Fig. 1(c)) can be computed as follows:
C total
tt
(b)
Ceff
tt
R
tt
C2
tt
(a)
(c)
The slope model uses a one-dimensional table to compute the effective driver resistance based on the concept of rise-time ratio [26].
The effective resistance of a driver depends on the transition time of
the input signal, the loading capacitance, and the size of the driver.
In this model, the output load and transistor size are first combined
into a single value called the intrinsic rise-time of the driver, which
is the rise-time at the output under the step input. The input rise-time
of the driver is then divided by the intrinsic rise-time of the driver to
produce the rise-time ratio of the driver. The effective resistance is
represented as a piece-wise linear function of the rise-time ratio and
stored in a one-dimensional table. Given a driver, one first computes
its rise-time ratio and then calculates its effective resistance R d by interpolation according to its rise-time ratio from the one-dimensional
table. The driver rise-time delay is computed by multiplying the effective resistance with the total capacitance.
Another commonly used driver delay model pre-characterizes the
driver delay of each type of drivers in terms of the input transition time
tt , and the total load capacitance CL in the form of k-factor equations
[30, 29], such as:
tf
k1 k2 CL tt
k1 k2 CL tt
y22 y3 C2 y1
y22 y3
y23 y32
(10)
After obtaining a -model for the interconnect, the effective capacitance can be computed iteratively from R, C1 and C2 in the -model
(Fig. 1(c) and (d)) using the following expression:
(d)
C1
td f
C1
k3 CL3 k4 CL k5
(8)
k3 CL2 k4 CL k5
(9)
where k1 5 and k1 5 are determined based on accurate circuit simulation (e.g. using SPICE) and linear regression or least square fits.
In general, a look-up table can be used to characterize the delay of
each type of
gate. A typical entry in the table can be of the follow
ing form: tt CL td f t f . Given input transition time tt and output
loading capacitance CL , the look-up table for a specific gate provides
the delay and rise/fall time td f t f . The table look-up approach can
be very accurate, if one can afford the time and space to generate a
detailed multi-dimensional table for each gate.
All these driver delay models use the loading capacitance for delay computation. As the first order approximation, the loading capacitance can be simply computed as the total capacitance of the interconnect and the sinks (Fig. 1(a) and (b)). However, not all the capacitance
of the routing tree and the sinks is seen by the driver due to the effect
of interconnect resistance shielding, especially in deep submicron design with fast logic gates of lower driver resistance. The effective capacitance model was proposed to first use a -model [24] (Fig. 1(c))
to better approximate the driving point admittance at the root of the
interconnect (or equivalently, the output of the driver), and then compute iteratively the effective capacitance seen by the driver, denoted
Ce f f , using the k-factor equations.
The -model of an interconnect is constructed using the first three
moments y1, y2 and y3 of the driving point admittance. The three moments of the driving point admittance are computed recursively in a
bottom-up fashion, starting from the leaf nodes of the interconnect.
Ce f f
C2 C1 1
R C1
tD tx 2
tD tx
tx
e R C 1 1 e R C1
R C1 2
tx tD tx 2
(11)
A. Driver Sizing
A chain of cascaded drivers is usually used at the source of an interconnect tree for heavy capacitive load. The driver sizing problem
is to determine both the number of driver stages and the size for each
driver. Using the simple switch-level RC model in Eqn. (6) and ignoring the capacitance of the driver output and the wire connecting to
consecutive drivers, one can show that if the loading capacitance is CL
and the stage number
is N, the optimal stage ratio at each stage should
early work TILOS [15] used the simple switch-level model for transistors, formulated the transistor sizing problem as a posynomial program, and applied a greedy sensitivity based method. The sensitivity
of a transistor is defined to be the delay reduction due to a unit increment of its size. The algorithm starts with a minimum-sized solution,
and timing analysis is applied. The transistor with the largest sensitivity is increased by a user defined factor and then timing analysis is applied again. This procedure terminates when the timing specification
is satisfied or all sensitivities are zero or negative. Recent advances
in transistor sizing include the use of more accurate transistor delay
model with consideration of the input waveform slope, and the use
of linear programming, convex programming, or other non-linear programming techniques for computing a global optimal solution. These
results are summarized in [6].
The gate sizing problem includes both the continuous and the discrete gate sizing problems. The continuous gate sizing problem assumes that all transistors in a gate can be scaled by a common factor,
which is called the size of a gate. It is very similar to the transistor
sizing problem, but has much lower complexity for a given design,
since all transistors in a gate are scaled by the same factor. As a result, more accurate delay models and global optimization techniques
are often used for continuous gate sizing, as reviewed in [6]. This formulation is useful for parameterized cell generation.
The discrete gate sizing problem assumes that each gate has a discrete set of pre-designed implementations (cells) as in a given cell library, and one needs to choose an appropriate cell for each gate for
performance optimization. This optimization is often performed as
part of the technology mapping procedure in logic synthesis. But in
deep submicron design it needs to be applied in connection with layout design so that a good estimation of the interconnect load is available. The discrete gate sizing problem has been shown to be NP-hard.
Optimal solutions are only applicable to special circuits (such as trees
and series-parallel circuits) based on dynamic programming. Heuristic methods have developed for general circuits based on sensitivity
analysis and/or mathematical programming. The reader may refer to
[6] for more details.
C. Transistor Ordering
The transistor ordering problem is to find the best ordering of
(series-connected) transistors in each gate to minimize the delay
and/or the power. This technique is useful as the signal arrival times
at a set of equivalent input transistors of a gate may be different, and
proper ordering can reduce the signal delay at the gate output. Both exhaustive methods and heuristic methods were proposed to search for
an optimal ordering. When applied to the entire circuit, it needs to be
combined with circuit timing analysis so that when a gate is optimized,
the signal arrival times at its inputs are known. The reader may refer to
[3] and [28] for more details. Transistor ordering has no (or little) area
penalty, but the reduction on delay is limited (usually around 5%).
B. Wiresizing Optimization
It was first shown in [10, 11] that when wire resistance becomes significant, as in the deep submicron design, proper wire-sizing can effectively reduce the interconnect delay. Assuming each wire has a set of
discrete wire widths, their work presented an optimal wire-sizing algorithm for a single-source RC interconnect tree to minimize the sum
of weighted delays from the source to timing-critical sinks under the
Elmore delay model. They showed that an optimal wiresizing solution satisfies the monotone property, the separability, and the dominance property. Based on the dominance property, the lower (or upper) bounds of the optimal wire widths can be computed efficiently by
iterative local refinement, starting from a minimum-width solution (or
maximum-width solution for computing upper bounds). Each local refinement operation refines the width of an edge in the routing tree assuming all other edge widths are fixed. The lower and upper bounds
usually meet, which leads to an optimal wiresizing solution. Otherwise, a dynamic programming based method is used to compute the
optimal solution within the lower and upper bounds. This method is
very efficient, capable of handling large interconnect structures, and
leads to substantial delay reduction. It has been extended to optimize
the routing trees with multiple drivers, routing trees without a priori
segmentation of long wires, and to meet the target delays using Lagrangian relaxation. The reader may refer to [6] for more details.
An alternative approach to wiresizing optimization computes an optimal wiresizing solution using bottom-up merging and top-down selection [20]. At each node v, a set of irredundant wiresizing solutions
of the subtree rooted at v is generated by merging and pruning the irredundant wiresizing solutions of the subtrees rooted at the children
B. Buffer Insertion
We feel that the most promising approach to performance optimization is to consider the interaction between devices and interconnects,
and optimize both of them at the same time. This section discusses the
recent advances in this area.
f X
where
p 0 q 0 i 1 j 1 j
a pi xi
0
a pi xi
p
i xi
0 and bq j x j
L X
q
bq j x j x j
a pi xi
xip
(12)
[5] that the dominance property holds for both simple and general CHposynomial programs, and the local refinement operation can be applied to compute the lower and upper bounds of an optimal solution
efficiently. Based on this general result, the work in [5] is able to perform simultaneous transistor and wire sizing efficiently given a general netlist (not limited to buffered trees). A significant advantage of
the CH-posynomial formulation is that it can handle more accurate
transistor models, including both simple analytical models or more accurate table-lookup based models obtained from detailed simulation to
consider the effect of the waveform slope, which leads to better optimization results.
Other studies on simultaneous device and wire sizing include using
higher order RC delay models for the interconnect by either matching
to the target moments or using a q-pole transfer function for sensitivity
analysis. The reader may refer to [6] for more details.
spect to an increase of xi . (ii) bq j x j is a function of x j . It monotonically decreases with respect to an increase of x j , but bq j x j xqj
still monotonically increases with respect to an increase of x i . When
the
coefficient functions in Eqn. (12) are constants, the function
f X is called a simple CH-posynomial. The class of simple CHposynomials is a subset of posynomial functions defined in [13]. The
optimization problem of minimizing a simple/general CH-posynomial
is called a simple/general CH-posynomial program. It was shown in
rithm enables us to study the interaction between topology optimization, buffer insertion, and wire sizing, and leads to many interesting
observations. For example, we observed that the delay reduction due
to wire sizing decreases as more buffers are inserted. Also, it is possible to construct a buffered routing tree first (with proper choice of
buffer size), and then perform simultaneous buffer and wire sizing
to produce a final routing solution with comparable signal delay but
much shorter computation time as compared to an all integrated approach.
Other methods have also been proposed for simultaneous topology
construction and wire sizing, including a greedy dynamic wire sizing
during iterative routing tree construction and use of link insertion with
dynamic wire sizing to create non-tree topologies. These algorithms
are summarized in [6].
R EFERENCES
[1] H. B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI,
Addison-Wesley, 1990.
[2] K. D. Boese, A. B. Kahng, and G. Robins, High-performance routing trees with identified critical sinks, Proc. Design Automation Conf.,
1993, pp. 182187.
[3] B. S. Carlson and C.Y. Chen, Performance Enhancement of CMOS
VLSI Circuits by Transistor Reordering, Proc. ACM/IEEE Design Automation Conf., 1993, pp. 361-366.
[4] C.-P. Chen, H. Zhou, and D. F. Wong Optimal Non-Uniform WireSizing under the Elmore Delay Model, Proc. Intl Conf. on ComputerAided Design, Nov. 1996, pp. 38-43.
[5] J. Cong and L. He, An Efficient Approach to Simultaneous Transistor
and Interconnect Sizing, Proc. Intl Conf. on Computer-Aided Design,
Nov. 1996, pp. 181186.
[6] J. Cong, L. He, C.-K. Koh, and P. H. Madden, Performance Optimization of VLSI Interconnect Layout, Integration, the VLSI Journal, Vol.
21, Nos. 1&2, November 1996, pp. 194.
[7] J. Cong, A. B. Kahng, G. Robins, M. Sarrafzadeh, and C. K. Wong,
Provably Good Performance-Driven Global Routing, IEEE Trans. on
Computer-Aided Design, 11(6), June 1992, pp. 739-752.
[8] J. Cong and C.-K. Koh, Simultaneous Driver and Wire Sizing for Performance and Power Optimization, IEEE Trans. on Very Large Scale
Integration (VLSI) Systems, 2(4), December 1994, pp. 408-423.
[9] J. Cong, C.-K. Koh, and K.-S. Leung, Simultaneous Buffer and Wire
Sizing for Performance and Power Optimization, Proc. Intl Symp. on
Low Power Electronics and Design, August 1996, pp. 271276.
[10] J. Cong and K. S. Leung, Optimal Wiresizing Under the Distributed Elmore Delay Model, Proc. Intl. Conf. on Computer-Aided Design, 1993,
pp. 634-639.
[11] J. Cong, K. S. Leung, and D. Zhou, Performance-Driven Interconnect
Design Based on Distributed RC Delay Model, Proc. ACM/IEEE Design Automation Conf., 1993, pp. 606-611.
[12] J. Cong and P. H. Madden, Performance Driven Routing with Multiple Sources, Proc. Intl Symp. on Circuits and Systems, 1995, pp. 1157
1169.
[13] J. G. Ecker, Geometric Programming: Methods, Computations and Applications, SIAM Review, vol. 22, No. 3, July 1980, pp. 338-362.
[14] W. C. Elmore, The Transient Response of Damped Linear Networks
with Particular Regard to Wide-Band Amplifiers, Journal of Applied
Physics, vol. 19, no. 1, Jan. 1948, pp. 5563.
[15] J. P. Fishburn and A. E. Dunlop, TILOS: A Posynomial Programming
Approach to Transistor Sizing, Proc. Intl. Conf. on Computer-Aided
Design, 1985, pp. 326-328.
[16] L. P. P. P. van Ginneken, Buffer Placement in Distributed RC-tree Networks for Minimal Elmore Delay, Proc. Intl Symp. on Circuits and Systems, 1990, pp. 865-868.
[17] R. Gupta, B. Tutuianu, B. Krauter, and L. T. Pillage, The Elmore Delay
as a Bound for RC Trees with Generalized Input Signals, Proc. 32nd
ACM/IEEE Design Automation Conf., June 1995, pp. 364369.
[18] N. Hedenstierna and K. O. Jeppson, CMOS Circuit Speed and Buffer
Optimization, IEEE Trans. on Computer-Aided Design, 1987, pp. 270281.
[19] A. B. Kahng and G. Robins, On Optimal Interconnections for VLSI,
Kluwer Academic Publishers, 1994.
[20] J. Lillis, C. K. Cheng and T. T. Y. Lin, Optimal Wire Sizing and Buffer
Insertion for Low Power and a Generalized Delay Model, Proc. IEEE
Intl. Conf. on Computer-Aided Design, Nov. 1995, pp. 138-143.
[21] N. Menezes, R. Baldick, and L. T. Pileggi, A Sequential Quadratic Programming Approach to Concurrent Gate and Wire Sizing, Proc. Intl
Conf. on Computer-Aided Design, 1995, pp. 144-151.
[22] N. Menezes, S. Pullela, F. Dartu and L. T. Pillage, RC Interconnect Synthesis - A Moment Fitting Approach, Proc. Intl Conf. on ComputerAided Design, 1994, pp. 418-425.
[23] N. Menezes, S. Pullela, and L. T. Pileggi, Simultaneous Gate and Interconnect Sizing for Circuit-Level Delay Optimization, Proc. 32nd
ACM/IEEE Design Automation Conf., June 1995, pp. 690695.
[24] P. R. OBrien and T. L. Savarino, Modeling the Driving-Point Characteristic of Resistive Interconnect for Accurate Delay Estimation, Proc.
Intl Conf. on Computer-Aided Design, Nov. 1989, pp. 512515.
[25] T. Okamoto and J. Cong, Buffered Steiner Tree Construction with
Wire Sizing for Interconnect Layout Optimization, Proc. Intl Conf. on
Computer-Aided Design, Nov. 1996, pp. 4449.
[26] J. K. Ousterhout, Switch-Level Delay Models for Digital MOS VLSI,
Proc. 21st Design Automation Conf., 1984, pp. 542548.
[27] L. T. Pillage and R. A. Rohrer, Asymptotic Waveform Evaluation for
Timing Analysis, IEEE Trans. on Computer-Aided Design, 9(4), Apr.
1990, pp. 352366.
[28] S. C. Prasad and K. Roy, Circuit Optimization for Minimization of
Power Consumption under Delay Constraint, Proc. Intl workshop on
Low Power Design, April, 1994, pp. 15-20.
[29] J. Qian, S. Pullela, and L. T. Pileggi, Modeling the Effective Capacitance for the RC Interconnect of CMOS Gates, IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, 13(12),
Dec. 1994, pp. 15261535.
[30] N. H. E. Weste and K. Eshraghian, Principles of CMOS VLSI Design: a
Systems Perspective 2nd ed, Addison-Wesley, 1993.
[31] T. Xue, E. S. Kuh and Q. Yu, A Sensitivity-Based Wiresizing Approach
to Interconnect Optimization of Lossy Transmission Line Topologies,
Proc. IEEE Multi-Chip Module Conf., 1996, pp. 117-121.
[32] D. Zhou, and X. Y. Liu, On the Optimal Drivers for High-Speed Low
Power ICs, to appear in InternationalJournal of High Speed Electronics
and System, 1996.
[33] D. Zhou, F. Tsui, and D. S. Gao, High Performance Multichip Interconnection Design, Proc. 4th ACM/SIGDA Physical Design Workshop,
Apr. 1993, pp. 3243.