Professional Documents
Culture Documents
Chieh Lin (
PROEFSCHRIFT
ter verkrijging van de graad van doctor aan de Technische Universiteit Eindhoven, op gezag van de Rector Magnicus, prof.dr. R.A. van Santen, voor een commissie aangewezen door het College voor Promoties in het openbaar te verdedigen op woensdag 20 februari 2002 om 16.00 uur
door
Chieh Lin
Dit proefschrift is goedgekeurd door de promotoren: prof.dr.ir. A.H.M. van Roermund en prof.dr.ir. R.H.J.M. Otten Copromotor: dr.ir. D.M.W. Leenaerts
c Copyright 2002 by Chieh Lin All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission from the copyright holder.
CIP-DATA LIBRARY TECHNISCHE UNIVERSITEIT EINDHOVEN Lin, Chieh Incremental mixed-signal layout generation concepts: theory and implementation / by Chieh Lin. - Eindhoven : Technische Universteit Eindhoven, 2002. Proefschrift. - ISBN 90-386-1880-8 NUGI 832 Trefw.: elektrische netwerken ; CAD / analoge geintegreerde schakelingen / optimalisering ; algoritmen / routering / grafen ; algoritmen. Subject headings: circuit layout CAD / mixed analogue-digital integrated circuits / circuit optimisation / algorithm theory / graph theory.
A few words of sincere appreciation to my wife Jie ( ) and my sons ) and Qi Mo ( ), Bo Yi ( for their understanding, faith and love.
promotiecommissie:
prof.dr.ir. W.M.G. van Bokhoven prof.dr.ir. P. Dewilde prof.dr.ir. G. Gielen prof.dr.ir. P. Groeneveld dr.ir. J.A. Hegt dr.ir. D.M.W. Leenaerts prof.dr.ir. R.H.J.M. Otten prof.dr.ir. A.H.M. van Roermund prof.ir. M.P.J. Stevens
Summary
The framework of this thesis encompasses the task of (automatically) generating a physical layout from a given circuit description including specications. Typically, the performance of the circuit extracted from a layout is not equal to the circuit-level performance before physical design. The reason for this is that the physical design step adds undesirable parasitic components to the original circuit, in effect causing a deviation from the wanted circuit behavior. The ultimate goal is to minimize this deviation with respect to the specications so that proper operation of the overall design can be warranted. Due to the complexity of the overall task, we mainly focus on the placement and routing problem. The class of circuits under consideration is a particularly difcult one, containing circuits with both analog and digital functionality operating at high frequencies. In this environment, physical phenomena such as crosstalk and process variations are most pronounced. Therefore, innovative automatic layout techniques are required that can produce high-quality layouts while considering all these second-order effects. The most important ingredients to tackle the placement and routing problem successfully are: adequate data structures, appropriate models/representations and efcient algorithms. Moreover, since the placement and routing problems are extremely difcult strongly coupled problems, heuristic methods are employed to nd near-optimal solutions. In this thesis, the simulated annealing algorithm is used as a general-purpose stochastic optimization engine. As this algorithm uses a massive amount of iterations before converging to a nal solution, it is worthwhile to reduce the amount of computed information during each iteration. This is actually the philosophy behind all novel techniques and algorithms presented in this thesis; try to use and compute strictly necessary information to nd a solution as efciently as possible. In other words, we adopt an incremental approach. The abstract representation we use for block placement is the well-known sequence-pair structure. The routing space is modeled using a sparsied grid-graph which is derived directly from a placement. This graph is used in conjunction with a variety of multi-pin routing heuristics for which we also developed visualization tools. During placement, substrate coupling is efciently taken into account using a novel approach combining the corner-stitching data structure and geometric techniques. We show that incremental placement techniques can substantially reduce computation time. The theoretical analyses are backed up by experimental results. Noticeably, the obtained gain increases with the problem instance size. A new method for constructing a dynamic global routing graph from a given placement is presented. This graph can be employed effectively in an incremental environment. Furthermore, several graph-based routing heuristics are benchmarked. Signicant improvements of new routing heuristics over existing ones are demonstrated over a broad range of synthesized problem instances. Also, incremental techniques that can be applied to these routing heuristics are described in the context of placement perturbations. Finally, we show that substrate-aware placement can also be exploited in
vi
Summary
Samenvatting
Het kader van dit proefschrift omvat het automatisch genereren van een fysieke layout uitgaande van een gegeven circuitbeschrijving inclusief specicaties. Normaliter is het gedrag van een circuit dat ge xtraheerd is van een layout, afwijkend van die van het originele cire cuitgedrag. De reden hiervoor is dat een layout additionele (parasitaire) componenten toevoegt aan het originele circuit, hetgeen resulteert in afwijkend circuitgedrag. Het uiteindelijke doel van een goede layout is het minimaliseren van deze afwijkingen ten aanzien van de opgegeven specicaties teneinde de correcte werking van het totaalontwerp te kunnen waarborgen. Ten gevolge van de complexiteit van deze taak, richten we ons hoofdzakelijk op het plaatsings- en bedradingsprobleem. We beschouwen een aparte, bijzonder moeilijke klasse van circuits, namelijk circuits die zowel analoge als digitale functionaliteit bevatten. In deze context spelen verschijnselen zoals overspraak en procesvariaties een zeer belangrijke rol. Derhalve zijn innovatieve automatische layouttechnieken noodzakelijk om kwalitatief hoogwaardige layouts te genereren waarin deze tweede-orde- verschijnselen meegenomen worden. De meest belangrijke ingredi nten die nodig zijn om het plaatsings- en bedradingsproe bleem adequaat op te lossen zijn: geavanceerde datastructuren, efci nte modellen/represene taties, en efci nte algoritmen. Aangezien het plaatsings- en bedradingsprobleem extreem e moeilijke, sterk gekoppelde problemen zijn, gebruiken we heuristische methoden om (bijna-) optimale oplossingen te vinden. In dit proefschrift, wordt het zogenaamde simulated annealing algoritme gebruikt als een algemene stochastische optimalisatie-aanpak. Daar dit algoritme een zeer groot aantal iteraties nodig heeft om te convergeren naar een eindresultaat, is het van belang het aantal benodigde rekenkundige operaties drastisch te reduceren. Dit laatste is in feite de losoe achter alle nieuwe algoritmen en technieken die in dit proefschrift aan de orde komen; probeer slechts die informatie te gebruiken en te berekenen die strikt noodzakelijk is om zo efci nt mogelijk tot een (eind)oplossing te komen. Met andere e woorden, we hanteren een incrementele aanpak. De abstracte representatie die wordt gehanteerd voor blokplaatsing is de welbekende sequence pair-structuur. De bedradingsruimte wordt gemodelleerd door middel van een ijle gridgraaf welke direct wordt afgeleid van een blokplaatsing. Deze graaf wordt gebruikt in combinatie met verscheidene multi-pin-bedradingsheuristieken waarvoor we tevens visualisatie-gereedschappen hebben ontwikkeld. Substraatkoppeling wordt op een efci nte e wijze in de plaatsing verdisconteerd met gebruikmaking van de zogenaamde corner stitching-datastructuur en geometrische technieken. We tonen aan dat incrementele plaatsingstechnieken kunnen leiden tot een substanti le ree ductie van de rekentijd. De theoretische analyses worden ondersteund door experimentele resultaten. De winst die op deze manier kan worden behaald, neemt toe naarmate de probleeminstanties in kardinaliteit toenemen. Tevens presenteren we een nieuwe methode waarmee een dynamische globale bedradingsgraaf uit een gegeven blokplaatsing verkregen kan wor-
viii
Samenvatting
den. Deze graaf kan op effectieve wijze gebruikt worden in een incrementele omgeving. Verder voeren we uitvoerige vergelijkende testen uit met verscheidene graafgeori nteerde e bedradingsheuristieken. We laten zien dat enkele van de nieuwe bedradingsheuristieken signicant beter presteren, gemeten over een vrij grote verzameling gesynthetiseerde probleeminstanties. Tevens wordt beschreven hoe incrementele technieken toegepast kunnen worden op de bedradingsheuristieken teneinde ze efci nter te laten opereren in de context e van plaatsingsperturbaties. Tenslotte laten we zien hoe op een efci nte incrementele manier e substraatkoppelingen meegenomen kunnen worden in de uiteindelijke layout, zonder de computationele complexiteit signicant te verhogen.
Contents
Summary Samenvatting List of Abbreviations 1 Introduction 1.1 Background . . . . . . . . . . . 1.2 State of the Art . . . . . . . . . 1.3 Motivation . . . . . . . . . . . . 1.4 Goals of this Research Work . . 1.5 Thesis Outline . . . . . . . . . . 1.6 Main Contributions of this Work v vii xiii 1 1 3 4 4 5 6 9 9 10 11 13 14 15 16 16 16 17 17 18 18 19 21 21 22 23 24 25 25 27
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
2 Problem Denition 2.1 Top-Down Flow and Bottom-Up Approach 2.1.1 A VLSI Design Cycle . . . . . . . 2.1.2 Physical Design . . . . . . . . . . . 2.1.3 Mixed-Signal Layout Styles . . . . 2.1.4 From Circuit to Layout . . . . . . . 2.1.5 Layout System Requirements . . . 2.2 The Mapping Problem . . . . . . . . . . . 2.2.1 High-Level Specications . . . . . 2.2.2 Layout System Specications . . . 2.2.3 Constraint Mapping Problem . . . . 2.2.4 High-Level Sensitivities . . . . . . 2.2.5 Lower Level Sensitivities . . . . . . 2.2.6 Sensitivity Computation Problem . 2.3 Placement and Routing Constraints . . . . . 3 Optimization Methods 3.1 VLSI Optimization Methods . . . 3.1.1 Deterministic Algorithms 3.1.2 Stochastic Algorithms . . 3.1.3 Heuristic Algorithms . . . 3.2 Simulated Annealing . . . . . . . 3.2.1 Basic SA Algorithm . . . 3.2.2 Problem Representation .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
Contents
3.3
3.2.3 Perturbation Operators . . . . . . . . 3.2.4 Acceptance and Generation Functions 3.2.5 Temperature Schedule . . . . . . . . 3.2.6 Stop Criterion . . . . . . . . . . . . . 3.2.7 Cost Function . . . . . . . . . . . . . Concluding Remarks . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
28 28 29 30 30 30 33 34 36 36 36 38 38 40 40 41 41 41 42 43 44 44 45 46 46 48 50 51 54 55 55 57 59 61 63 66 70 70 76 77 81 81 83
4 Optimization Approach Based on Simulated Annealing 4.1 Optimization Flow . . . . . . . . . . . . . . . . . . 4.2 Problem Representation . . . . . . . . . . . . . . . . 4.2.1 Placement . . . . . . . . . . . . . . . . . . . 4.2.2 Routing . . . . . . . . . . . . . . . . . . . . 4.2.3 Substrate Coupling . . . . . . . . . . . . . . 4.3 Perturbation Operators . . . . . . . . . . . . . . . . 4.4 Acceptance and Generation Functions . . . . . . . . 4.5 Temperature Schedule . . . . . . . . . . . . . . . . 4.6 Stop Criterion . . . . . . . . . . . . . . . . . . . . . 4.7 Cost Function . . . . . . . . . . . . . . . . . . . . . 4.7.1 Implicit Cost Evaluation . . . . . . . . . . . 4.8 Concluding Remarks . . . . . . . . . . . . . . . . . 5 Efcient Algorithms and Data Structures 5.1 Computational Model . . . . . . . . . . 5.2 Asymptotic Analysis . . . . . . . . . . 5.3 Computational Complexity . . . . . . . 5.4 Data Structures for CAD . . . . . . . . 5.4.1 Corner Stitching . . . . . . . . 5.4.2 Linked List . . . . . . . . . . . 5.4.3 Splay Tree . . . . . . . . . . . 5.4.4 Hash Table . . . . . . . . . . . 5.4.5 Priority Queue . . . . . . . . . 5.4.6 Other Advanced Data Structures 5.5 Concluding Remarks . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
6 Placement 6.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Effective and Efcient Placement . . . . . . . . . . . . . . . . . . . . 6.3 Representation Generality, Flexibility and Sensitivity . . . . . . . . . 6.4 Sequence Pair Representation . . . . . . . . . . . . . . . . . . . . . . 6.5 Graph-Based Packing Computation . . . . . . . . . . . . . . . . . . 6.5.1 Relative Placement Computation . . . . . . . . . . . . . . . . 6.5.2 An Efcient Relative Placement Algorithm . . . . . . . . . . 6.5.3 Absolute Placement Computation . . . . . . . . . . . . . . . 6.6 Non-Graph-Based Packing Computation . . . . . . . . . . . . . . . . 6.6.1 Maximum-Weight Common Subsequence (MWCS) Problem . 6.6.2 Maximum-Weight Monotone Subsequence (MWMS) Problem
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
Contents
xi
6.7
Graph-based Incremental Placement Computation . . . . . . . . . . . . . . . 6.7.1 Incremental Relative Placement Computation . . . . . . . . . . . . . 6.7.2 Incremental Relative Placement Computational Complexity . . . . . 6.7.3 Incremental Absolute Placement Computation . . . . . . . . . . . . 6.7.4 Incremental Absolute Placement Computational Complexity . . . . . 6.7.5 Average Incremental Computational Complexity . . . . . . . . . . . 6.8 Implementation Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 6.9 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9.1 A Single Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9.2 Packing Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.10 Placement-to-Sequence-Pair Mapping . . . . . . . . . . . . . . . . . . . . . 6.11 Constrained Block Placement . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11.1 Non-Graph-Based Constrained Placement . . . . . . . . . . . . . . . 6.11.2 Implementation Considerations . . . . . . . . . . . . . . . . . . . . 6.11.3 Experimental Results on Non-Graph-Based Constrained Block Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11.4 Incremental Graph-Based Constrained Placement . . . . . . . . . . . 6.12 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Routing 7.1 The Routing Problem . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Classication of Routing Approaches . . . . . . . . . . . . . . . . 7.2.1 Routing Hierarchy . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Routing Model . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . 7.5 Global Routing Model . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Model Efciency . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 Global Routing Graph Computation . . . . . . . . . . . . . 7.5.3 Supporting Dynamic Changes . . . . . . . . . . . . . . . . 7.6 Global Routing Algorithms . . . . . . . . . . . . . . . . . . . . . . 7.6.1 Two-pin Routing Algorithms . . . . . . . . . . . . . . . . . 7.6.2 Minimal Bounding Box (MBB) Routing . . . . . . . . . . . 7.6.3 Minimum Spanning Tree (MST) Routing . . . . . . . . . . 7.6.4 Path-Based Routing . . . . . . . . . . . . . . . . . . . . . 7.6.5 Node-Based Routing . . . . . . . . . . . . . . . . . . . . . 7.7 Benchmarking of Heuristics in Our Routing Model . . . . . . . . . 7.7.1 Benchmark Problem Instances . . . . . . . . . . . . . . . . 7.7.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . 7.7.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . 7.8 Incremental Routing . . . . . . . . . . . . . . . . . . . . . . . . . 7.8.1 Re-routing Nets Connected to Moved Modules . . . . . . . 7.8.2 Re-routing Affected Nets Not Connected to Moved Modules 7.9 Impact of Routing on Placement Quality . . . . . . . . . . . . . . . 7.9.1 Integrated Placement and Routing . . . . . . . . . . . . . .
87 88 96 97 99 105 106 106 107 108 111 112 116 116 119 120 123 124 127 128 129 130 132 133 134 135 135 136 137 139 140 144 145 146 151 153 153 154 159 160 160 164 166 166
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
xii
Contents
7.9.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 167 7.9.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 7.10 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 8 Dealing with Physical Phenomena: Parasitics, Crosstalk and Process Variations 171 8.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 8.2 Efciency and Accuracy Requirements . . . . . . . . . . . . . . . . . . . . . 172 8.3 Self-Parasitics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 8.3.1 Wire Resistance, Capacitance and Inductance . . . . . . . . . . . . . 173 8.3.2 Via Resistance and Area . . . . . . . . . . . . . . . . . . . . . . . . 173 8.4 Crosstalk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 8.4.1 Substrate Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 8.4.2 Parasitic Coupling Capacitance . . . . . . . . . . . . . . . . . . . . 176 8.5 Process Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 8.6 Incorporating Crosstalk and Parasitics into Routing . . . . . . . . . . . . . . 177 8.7 Incorporating Substrate Coupling into Placement . . . . . . . . . . . . . . . 177 8.7.1 A Basic Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 8.7.2 Generalized 2-Dimensional Substrate Coupling Model . . . . . . . . 179 8.7.3 Substrate Coupling Impact Minimization . . . . . . . . . . . . . . . 180 8.7.4 An Efcient Substrate Coupling Impact Minimization Algorithm . . . 182 8.7.5 Implementation Considerations . . . . . . . . . . . . . . . . . . . . 182 8.7.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 183 8.7.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 8.8 Incremental Substrate Coupling Impact Minimization . . . . . . . . . . . . . 184 8.9 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 9 Conclusions and Directions for Future Research 187 9.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 9.2 Directions for Future Research . . . . . . . . . . . . . . . . . . . . . . . . . 188 Bibliography Acknowledgement Curriculum Vitae 189 201 203
List of Abbreviations
Average-Distance-Based Heuristic Average-Distance Heuristic Bounded Sliceline Grid Chip Area Computer-Aided Design Coupling Impact Constrained Maximum-Weight Common Subsequence Directed Acyclic Graph Direct View Electronic Design Automation Euclidian Steiner Minimal Tree Field-Programmable Gate Array Graph Steiner Minimal Tree Integrated Circuit Incremental Longest Paths Intellectual Property Iterated Shortest-Paths-Based Heuristic Longest Common Subsequence Left-Down Labeled Ordered Tree Minimal Bounding Box Minimum Spanning Tree Maximum-Weight Common Subsequence Maximum-Weight Monotone Subsequence Non-Slicing Placement Evaluation Random-Access Machine Rectilinear Steiner Minimal Tree Simulated Annealing Substrate Coupling Impact Minimization Steiner Minimal Tree Sequence Pair Shortest-Paths-Based Heuristic Shortest-Paths Heuristic Single-Source Shortest Paths Very Large Scale Integration Wire Length
xiv
List of Abbreviations
Chapter 1
Introduction
This purpose of this chapter is to give the reader an idea of the framework in which this work is oriented. Moreover, the need for automation of high-quality layout generation for mixed-signal designs is claried. Motivated by this fact, the objectives of this work are dened. Furthermore, an overview of the remaining chapters is given, concluded by the main contributions of this work.
1.1 Background
The design of integrated circuits has been an actively exploited area for almost half a century already. The possibility to integrate a plethora of functions onto a small piece of semiconductor material has enabled the development of many high-tech systems, e.g. the modern personal computer. Without exaggeration, one can state that without the invention of integrated circuits, the world would not be as it is today. With improvements in manufacturing technologies, also the integration density of components within a single integrated circuit (IC) has increased dramatically. The exponentially growing trend with the number of components in an IC as a function of time, still seems to hold and is expected to hold for at least another decade [1]. Figure 1.1 depicts this trend graphically. This trend is better known as Moores Law. Within this gure, a few keywords clarify some important trends. A very noticeable effect is that with increasingly smaller feature sizes and larger designs, the intrinsic speed of transistors increases, but the (global) wire delays also increase. With this trend, a vast area arose dedicated to the integration of circuits which is called the eld of Very Large Scale Integration (VLSI). Increasing the number of components in a given area has an obvious cost benet, because the number of produced ICs per time unit increases when other factors are kept constant. However, as always there is also a problematic dark side that comes with higher integration. The problems are even more pronounced due to higher operating frequencies of current systems. Roughly speaking, one part of the problems is related to the percentage of working ICs, which is called yield. Yield is a complicated factor that has links with many aspects of VLSI technology; from system design to circuit design, to layout design, to technology. Moreover, due to smaller sizes, accuracy and power dissipation problems emerge. Typically, most of these performance factors can be traded off against each other. The other part consists of the increasing inuence of unavoidable parasitic elements such as parasitic resistances, capacitances, inductances. But also parasitic substrate coupling and, to a lesser extent, electromagnetic coupling cannot be neglected anymore. Simply stated, the non-ideal behavior
2 Technology Trends
Introduction
200 transistor size (nm) 100 ASIC chip size (cm ) 10 # interconnect layers
1 1997 1999 2002 2005 2008 0.25 0.18 0.13 0.10 0.07 technology ( m) 2011 0.05 2014 year 0.035
Figure 1.1: The National Technology Roadmap for Semiconductors 1998. of semi-conductor material starts affecting the functionality of the IC in such a way, that measures have to be taken to ensure good functioning. The correct functioning of a system is especially susceptible to these parasitic phenomena in the case of mixed-signal designs, where both (insensitive and noisy) digital and (sensitive) analog building blocks are present on the same chip. Practical experience has been used and is still used to limit the adverse effects of nonidealities. However, due to the high complexity of VLSI systems it is an immensely difcult task to handle all problems adequately, even for an expert designer. This is where the computer comes into play. When a computer is used properly, it is able to handle large amounts of data and process it in such a way that the generated output satises certain given specications. The use of computers in a design task is called Computer Aided Design (CAD). A more appropriate term in connection with computer-aided design of ICs is Electronic Design Automation (EDA) which includes electronic CAD tools, but is more general. The purpose of a CAD tool is to support the designer during the process of realizing an IC. The nal physical outcome of the design process is a disc of silicon, called a wafer, which consists of a number of more or less identical copies of the same integrated circuit. This wafer is then cut, resulting in a set of dies, each one of them containing the same integrated circuit. The creation of such a wafer is accomplished using a set of masks which are used to deposit several layers of different materials onto the wafer. The task of organizing geometric information in this context, resulting in an answer to the question which materials have to be put where on the wafer, is called physical design. The end result of a physical design step is a called a layout, which is essentially a set of masks that comply with given design rules. Also layout synthesis or layout generation are used frequently in the same context. Although layout generation is a very important step in the VLSI process, it is only one of many steps. Some other important steps are circuit design, simulation and verication. Our primary concern in this work is the layout generation step of VLSI design. More
specically, we deal with a remarkably interesting and challenging subclass consisting of both analog and digital ingredients. In essence, layout generation is accomplished by solving two strongly coupled problems under a set of constraints. These two problems are known as the placement and the routing problem.
Introduction
pre-determined performance sensitivity values. An implicit assumption of this approach is that the circuit is sufciently linear in the region in which the layout parameters under consideration have inuence. The ultimate goal is a layout that satises all performance constraints by construction. Several other groups and researchers have attempted to transfer the methodologies which are used in the digital VLSI domain to the analog and mixed-signal VLSI domain, but most of these approaches have not been very successful as yet. The main reason being the fact that digital approaches rely on certain assumptions (to reduce complexity) that are simply unacceptable in the analog domain. A good example of this is the consideration of only a critical path to determine the quality of wiring.
1.3 Motivation
From the previous discussion it should be clear that VLSI design consists of several tasks which are very hard to solve in a proper way. Also the design step that considers layout generation is extremely difcult. It should be clear that the quality of a layout is of utmost importance in any IC design. Thus far, only few researchers have concentrated on layout generation for mixed-signal designs. Although layout generators are known for analog designs, those systems are usually not suitable for general application to mixed-signal designs where the layout problems are worst. There is not a single layout generator which is best compared to others. All existing approaches and systems have fundamental limitations and weaknesses. The current layout generation problems suffer somehow from at least one of the following problems.
Properly placing objects sub-circuit modules in a two-dimensional plane is performed poorly with respect to wiring quality. Only a subset of mixed-signal design constraints is (or can be) taken into account during placement and routing. Ad hoc solutions are used which are not very robust and require a signicant amount of (problem dependent) tuning effort. Scalability properties are poor due to inefcient modeling and/or implementation.
Thus, the necessity of improved layout generation concepts and systems is clear.
with better scalability1 performance. The latter approach is more interesting from a scientic point of view. The established methodology and concepts should be practically useful, demonstrated by simulation results. Scalability and generality of the approach is also a major concern. These features should be clear from theory, preferably backed up by experimental results.
Introduction
As these concepts are tightly coupled with a certain problem, it is more convenient to introduce and elaborate on them while discussing the underlying problem. A fundamental concept to improve efciency is incremental computation. In essence, the idea is to compute only new information when it is strictly necessary. We show that this approach leads to fundamental improvements in placement and global routing efciency in the adopted stochastic (simulated annealing) optimization framework. Note that higher computational efciency automatically implies better scalability properties. In each chapter, where appropriate, experimental results are given after the discussion of the respective algorithms. Furthermore, we have attempted to describe and present the experiments (and their results) in such a way that comparison with existing works is not hampered. We end this thesis with main conclusions and directions for future research in Chapter 9.
A novel incremental approach to exact placement optimization is presented. The approach gives signicantly better asymptotic computational complexity results for a single placement computation iteration within a simulated annealing environment [9]. A theoretical analysis is given supported by experimental results. A new consistent linear-time algorithm is given for mapping a given placement of modules in a user-specied region to an efcient formal representation. The algorithm is, for instance, useful for converting graphical (user-interface) data to an abstract format which can be further processed by means of efcient algorithms. The algorithm can also be utilized for applying hierarchical notions to a given placement. An improved robust placement algorithm is given which can incorporate range and boundary constraints imposed on specic modules in an efcient manner. Experimental results are given to illustrate this. A framework has been established which incorporates placement and global routing. Within this framework it is relatively easy to incorporate physical problems related to the spatial distribution of objects in a plane [10]. This type of consideration is becoming increasingly more important for contemporary mixed-signal designs. For example, substrate coupling in high-performance mixed-signal ICs [11] is causing serious circuit performance degradation, and surface gradients in high-speed digital-to-analog converter designs cause serious spectral performance losses [12]. We note that the overhead in computational complexity is mostly in constant factors due to an efcient combination of advanced data structures. We establish new results on very fast Steiner minimal tree (SMT) approximation algorithms in combination with efcient dynamic routing graph models. The novelty
is mainly due to the combination of an efcient sparse routing graph model and improved shortest-paths-based heuristics. We compute heuristic routing results on a broad set of placement-derived problem instances and net sizes, which we compare to optimal routing solutions. These optimal solutions are obtained using state-of-the-art third party tools. The differences in solution quality are small (0% to 5%), which implies that more expensive SMT heuristics can only improve marginally at the cost of a (signicant) increase in execution time. It turns out that due to the difculty of some of the routing problem instances, not all optimal solutions are computable [13]. It is the rst time that, in the context of accurate placement and global routing, for routing problems derived from practical placements, optimal results are computed.
Extensive experimental evaluation of the proposed placement and routing algorithms have led to new results which compare favorable with existing state-of-the-art results. Furthermore, based on experiments, we expose discrepancies in most current packingcentric works that use inadequate routing schemes.
Introduction
Chapter 2
Problem Denition
This chapter describes the problem which is attacked in this thesis in more detail. We show explicitly where the problem of layout generation is located within the overall VLSI design cycle. Then we zoom in on the layout problem and show that it is a non-trivial problem to solve. In order to solve the problem adequately, rst it has to be dened in an accurate way. One part of problem denition entails proper modeling of physical entities. The other part is formulation of given real-lifespecications into simpler specications that can be handled properly at an algorithmic level. In principle, the layout problem can be split in two, strongly coupled, parts. One part is the placement or oorplanning problem, the other part is the routing problem. A typical layout problem could be stated as follows. Given a set of geometric objects to be placed in a two-dimensional plane, place these objects in such a way that a certain cost function is minimal. A standard textbook on physical design automation will take for the cost function the total length of all interconnecting wires. The catch is that in order to compute or estimate the length of a wire, placement information is needed. But, routing information is needed to compute a placement! This loop can, for instance, be broken by computing a placement based on the amount of interconnections between the blocks; blocks with many interconnections should be placed closer together than blocks with less interconnections. In textbooks, the approach is called the min-cut problem. It is an approximation to the layout problem, in that it does not cope with wire length but with number of interconnections. For more information on the layout problem the reader is referred to, e.g., [14, 15].
10
Problem Denition
In cases where we precisely know the impact of a certain higher-level decision on a lower level, a top-down approach is very convenient. However, when problems become more complicated and interdependencies start playing an important role, it is almost impossible to accomplish the task in an adequate way with solely top-down information. As the amount of information from a lower level starts getting increasingly more important on a higher level, we speak of a bottom-up ow. We make plausible that, indeed, a bottom-up approach naturally applies to layout generation. During the top-down ow of the design cycle, we arrive at the point where information needs to be supplied to the layout generation system. Hence, the interface of the layout system must be dened explicitly, facilitating communication of relevant information to the layout system. As a consequence, the layout system itself can operate more efciently and consistently generate predictable results as a function of several input parameters which will be described shortly.
Figure 2.1: A general design cycle showing important aspects of VLSI design.
11
modules. After placement and routing, the layout can be manufactured and nally tested to see if its functionality and performance complies with the original idea and its specications. Although the overall ow of information is top-down, the last four blocks in the diagram are drawn bottom-up. The intention is to make clear that the physical design part requires an essentially bottom-up approach. The reason for this is that a mixed-signal design has both digital and analog components and typically these components are highly interconnected. It is this class of designs that is impacted most severely by parasitic effects such as substrate coupling, delay, mismatch, etc. As such, it is sheer impossible to decouple placement and routing while targeting high quality. Thus, predicting the result of placement and routing is at least as difcult. The outer ow in the gure states what type of information is exhibited at a certain stage in the design cycle. At the highest level the behavioral representation is eminent. After that, the structural representation becomes important, in which more precise information is given on what functions are performed where. At the technological representation level, the implementation aspects come into play. It species what type of circuit elements are used, their properties, and so on. The physical representation level comprises of everything that is directly related to the layout of the circuit on the wafer. Finally, a prototype IC is available. Note that the direction of the arrows only indicates the ow of the processes in time for each part of the overall design, not the interdependency of the processes. For example, in order to perform adequate placement and routing, information is needed on certain specications. Furthermore, in the architectural design, testing facilities should be taken into account. In short, strong interrelationships exist between almost all of the VLSI process steps. Hence, it is impossible to regard a specic process step without taking notice of the other steps. On the other hand, including many process steps in an attempt to nd a universal layout methodology will be too idealistic because of the intrinsic problem complexities involved. A way to solve this dilemma, is to dene an interface from each block to the other blocks and specify exactly what is input and what is output, and nd a methodology that will provide high-quality layouts within the conned framework. This is the well-known top-down approach.
12
Problem Denition
are conceived for the placement and routing phase, which are the core problems of physical design. The initial layout needs to be checked for design rules compliance. After that, an extraction of the layout needs to be performed. The extracted information is an annotated netlist including all parasitic elements which are not or only partially accounted for in the circuit netlist (schematic). This annotated netlist is compared with the original netlist to see if any discrepancies have been introduced, apart from the parasitics. Using the annotated netlist, circuit simulations are performed, typically with a Spice-like simulation tool. If all is well, and the specications are complied with, the nal layout is ready to be fabricated. If something is wrong, a change in the placement/routing is required and the loop is repeated until the layout is acceptable. However, it may turn out that the layout system cannot nd
specifications circuit netlist technology
layout
placement/routing modification
no
annotated netlist
no
simulated specifications
no
Figure 2.2: A classical ow of the physical design step. a satisfactory solution (even if the system would be ideal). In such cases there is an escape route via the dotted arrows to adjust, for example, the specications, or transistor models which are used by the simulator.
13
14
Problem Denition
used to dene the routing. Instead, the interconnect is dened by a separate process step. Therefore, sea-of-gates design cannot be used easily for rapid in-house prototyping such as FPGA. Another noticeable difference between FPGA and sea-of-gates is that the latter has a very ne grain size compared with the former. Typically, an FPGA primitive cell consists of a multiple-transistor circuit, whereas a sea-of-gates primitive cell is a single transistor.
the process technology data: design rules, via resistance and capacitance, metal sheet resistances, substrate resistance, guard ring constructions, etc.; information on pins: the allowable range of resistance ( , ), capacitance ( , ) , inductance seen at the output of a pin, the range of current amplitudes; information on modules: height ( ) and width ( ) of each module, the exact position of each pin connected to a module, the nets connected to a module, the sensitivity or noisiness of a module, etc.; a cost function: the parameters that need to be optimized in the layout, importance of certain parameters over others, constraints on specic module positions, constraints on the total layout size or aspect ratio, etc.
A pin is located at the perimeter of a layout module, and forms a gateway to the outside world as seen from the module. Furthermore, a (layout) module is assumed to be rectangular.
15
higherlevel input
technology
pin info
module info
etc.
Figure 2.3: Top-down ow of a physical design process. The output of the layout system is a layout which is essentially similar to the high-level overall reference circuit. After all parasitic elements have been added to the reference circuit schematic, a simulation should show that the layout complies to all specications. It is important to note that although issues such as yield and reliability are not taken into account, the layout system should not preclude the integration of these important matters. Therefore, generality of the layout system is a concern throughout this work.
16
Problem Denition
generality must be high, to allow for easy incorporation of models of performance degradation; robustness must be high, to produce consistently good and predictable solutions.
Note that minimizing sensitivity is similar to maximizing exibility in parameter value range in practical circumstances. Implicitly, this mapping is shown in Figure 2.3. Generally it is not trivial to perform this mapping. Hereafter, the mapping problem is discussed to make the reader aware of this problem, but no solution is proposed in this thesis.
17
Table 2.1: Examples of high-level circuit specications. system type typical specications analog lter bandwidth quality factor D/A converter integral nonlinearity differential nonlinearity spurious-free dynamic range digital decoder maximal propagation delay fan-out clock frequency logic functionality
18
Problem Denition
Table 2.2: Examples of high-level sensitivities. system type typical high-level sensitivity analog lter robustness of transfer function to variations of passive components in the circuit D/A converter clock jitter effect on spurious-free dynamic range digital decoder variations in signal delay relative to output load
(2.1)
where is some kind of performance measure and is a lower-level parameter. If high-level sensitivities are known, (2.1) can also be computed using
(2.2)
where is a high-level parameter (such as clock jitter). Layout system sensitivities are (implicitly) represented by the range of allowable values for each pin parameter: , etc. Also modules have low-level sensitivity measures associated with them. For instance, module noisiness and module sensitivity are two module parameters that are useful for minimizing the detrimental effect of substrate coupling. The former quanties the capability of performance degradation that can be inicted on neighboring modules. The latter quanties the vulnerability of a certain performance measure to substrate noise.
19
We do not allow over-the-cell-routing. Although more than two metal layers are typically available for routing in modern process technologies, the exploitation of the lowest metal layers can benet: the reduction of routing problems at higher layers, the reduction of yield-decreasing vias, the avoidance of unpredictable interaction with intellectual property (IP) blocks. Each pin-interconnecting network should have minimal length. For reasons of simplicity, but without being too restrictive, we assume that this is an optimal way to connect the pins of a net. Unfortunately, this apparently simple problem (at least for a small number of pins), is a very hard problem which is better known as the Steiner minimal tree problem [19, 20].
We justify these restrictions using the following arguments. In mixed-signal layout design the effective use of space around modules in the lowest metal layer decreases the unwanted coupling between, for instance, polysilicon and metal considerably. Moreover, any created coupling can be controlled much more tightly. The oupling from higher metal layers to the bottom layers is signicantly smaller, which justies the use of higher metal layers for overthe-cell routing. The minimal-length metric is less restrictive than it seems since it does not imply a geometric metric. In fact, a very broad class of interconnection networks can be covered by
20
Problem Denition
dening (sophisticated) weight functions for the branches in the network. These weight functions typically depend on physical properties of each branch, e.g. the voltage/current variation and magnitude, or the physical location of each branch. The latter accomodates (parasitic) interaction of this branch with neighboring obstacles. Besides the fact that area and wire-length constraints are very important, they are denitely not the sole constraints relevant to placement and routing. Especially in the context of mixed-signal layout generation, we must rene this set of primary constraints and additionally include, or allow for inclusion of performance-related constraints such as substrate coupling impact minimization, crosstalk minimization, optimal matching, etc. The proposed framework should be able to incorporate the overall set of constraints in an efcient manner.
Chapter 3
Optimization Methods
In this chapter a variety of well-known VLSI optimization methods is described. As pointed out in Chapter 2, there are many constraints involved in mixed-signal layout generation which makes this task intrinsically difcult to solve properly. Moreover, due to the many types of constraints that are involved, the type of optimization algorithm which is used to generate a layout can have a signicant inuence on the nal result, both in quality and in computation time. Naturally, each type of optimization framework has its cons and pros. The points that are regarded important in our task are:
easy handling of a heterogeneous mixture of constraints, efcient placement and routing representations, efcient computation of placement and routing solutions, practical achievability of near-optimal results, low implementation complexity.
First an overview of existing approaches to successful VLSI optimization is given. Then one of the approaches is selected, based on the previously described criteria, and used for our optimization framework. It should be noted that most of the described optimization methods have been shown to work well on a given set of problems. Conversely, it is a known fact that an optimization method that performs well on a certain class of problems, might perform poorly on another class of problems, with or without tuning. Thus, generalizing results to related or modied problems should be done with utmost caution. We attempt to place the shortly presented methods under the same uniform umbrella of placement and routing. However, only some of these methods have properties which are suitable for general placement and routing, taking into account the previously mentioned important points. We elaborate on one of the most promising methods which is known as the simulated annealing algorithm.
22
Optimization Methods
are intractable, i.e. it requires an excessive amount of time to solve a problem instance to optimality when the instance size is increased. In other words, the problems are NP-hard [21]. Nonetheless, in practice the layout generation problem is split into a placement and a routing phase. The last one may again be split in a global routing and a detailed routing phase. As a direct consequence of the NP-hardness of layout generation, we have to resort to heuristic or approximation methods that yield an acceptable solution within reasonable time. The following classication might not be optimal, but it is one that matches well with contemporary ideas. Furthermore, it provides a good impression of the vast body of research activities in this eld. An extensive overview can be found in [14]. A very recent, and more mathematically avored comprehensive overview is contained in [22].
sub-optimality of the solution, high execution speed, the same solution is found, each time the algorithm is run.
As deterministic algorithms were the rst type of algorithms to see the light, the number of such algorithms is very large. Only a few deterministic algorithms will be mentioned here. Problem-dependent Methods
Rule-based algorithms In this approach, expert knowledge is translated into rules which are used by the system to generate a proper layout. Clearly, the quality of the rules is of paramount importance. Furthermore, the set of rules should be adapted to accomodate for new types of circuits and layout techniques that are introduced. As a consequence, maintaining a good set of rules is labor intensive. A fundamental problem in connection with a rule-based approach is the difculty of dening general and context-independent rules. Template-based algorithms As the name implies, templates are used as a starting point, guided by specic values of input parameters, to transform a certain template to a proper layout. The creation of the templates is a knowledge-intensive task, which is one of the main bottlenecks of this approach. Moreover, the set of obtainable layouts is limited to the set of available templates and their combinations.
Problem-independent Methods
Linear programming algorithms A linear programming algorithm describes the problem as an constraint matrix
23
, an -vector , and a cost vector . A solution of a linear problem is then one that satises the linear constraints and , while minimizing .
Divide-and-conquer algorithms Divide-and-conquer algorithms partition the problem into more or less independent subproblems, solve the subproblems recursively, and then combine their solutions to solve the original problem. Dynamic programming algorithms Dynamic programming, like the divide-and-conquer method, solves problems by combining the solutions to subproblems. Programming in this context refers to a tabular method, not to writing computer code. In contrast to divide-and-conquer algorithms, dynamic programming is applicable when the subproblems are not independent, that is, when subproblems share subsubproblems. In this respect, a divide-and-conquer algorithm does more work than necessary by solving common subsubproblems more than once. The latter is avoided by dynamic programming through the use of a table in which each solution to a solved subsubproblem is stored, saving a signicant amount of computation time. Branch-and-bound algorithms The branch-and-bound method is an exact method that can be applied to a broad class of problems. All that is required is a tree-structured conguration space and an efcient way of computing tight lower bounds on the cost of all solutions containing a given partial solution. Typically this method can only be applied successfully to small problem instances, but with clever pruning techniques a larger solvable range can be reached.
24
Optimization Methods
Two well-known stochastic algorithms are simulated evolution [23, 24] and simulated annealing [25]. Lately, there has been an increased interest in so-called memetic algorithms, a concept which sprouted from the mind of Dawkins [26]. Memetic algorithms are a generalization of genetic algorithms in which the human mind plays a crucial role; cultural inuences have a signicant inuence on the survival capability of a certain species, in conjuction with specic natural genetic properties.
25
shown in Chapter 6, the SA algorithm is a very promising candidate for the layout generation problem. A separate section is dedicated to discussing general features of it.
is accepted
(3.1)
where and are the costs associated with states and , respectively, and is the temperature of the system. The function of the temperature is as follows. When is large (compared to a typical cost difference), the right-hand-side of (3.1) is close to one. This implies that at high temperatures, cost increments are almost always accepted. When the temperature is decreased gradually, the impact of the cost difference will get more pronounced.
2 The
26
Input: solution state space and all optimization parameters Output: (near-)optimal state
1 random initial state 2 3 4 while 5 do 6 generate 7 compute cost 8 if random 9 then 10 11 12 13 if 14 then 15 16 17 18 decrease temperature 19 od 20 return
Optimization Methods
Figure 3.1: A basic simulated annealing algorithm. Consequently, at low temperatures the right-hand-side of (3.1) will be close to zero for a typical cost increment. Thus, the probability of accepting the corresponding will be very small. On lines 13-17, the best state and its associated cost are stored for later reference. The temperature is lowered on line 18. Although the basic simulated annealing algorithm is simple in appearance, and has good practical performance, its internals are not well understood. Simulated annealing has attracted much attention because it treats every problem as a black box. Therefore a very large class of problems can be solved using SA. A few examples are: combinatioral problems [28], function optimization problems [29], and neural network optimization and training [30, 31]. Many adaptations of and extensions to the classical SA algorithm are known. The existing literature on this topic is too extensive to cover here. We only mention a few interesting concepts and approaches. In [32] Boese and Kahng observe that under nite-time conditions, the classical monotonically decreasing temperature schedule is not optimal when the best solution seen so far is the output of the SA algorithm, as opposed to the last solution seen (that is accepted). A recipe is given to derive a (near) optimal best-so-far temperature schedule. In [33] Cong et al. propose to use a dynamic weighting Monte Carlo approach for oorplanning; they obtain promising results. The essential difference in their approach is an SA algorithm with a stochastic temperature schedule. In a general fashion, we can state that function decrease temperature should be replaced by adjust temperature in order to maximize the power of SA. The generality of SA comes at the cost of a large amount of computational resources that are required for practical problems. There are two ways to reduce the amount of computational resources. The rst one is to minimize the number of iterations. This can be accomplished in various ways; by choosing a better representation of the problem, by modi-
27
fying the cooling schedule, by choosing a better generation function, by nding more suitable perturbation operators, etc. Also a mixture of the aforementioned items is not unimaginable. Actually, it is not known how to minimize the number of iterations in an optimal way. Most approaches rely on intuitive notions. Altogether, a practical SA implementation is truly heuristic in nature. The second way is to reduce the computations within a single SA iteration to a minimum. This approach is taken in this thesis. To state this more clearly in a more abstract way: the computational complexity of a single SA iteration is taken as the performance measure. For more information on computational complexity matters, the reader is referred to [34]. A point worth noting is the fact that for practically all non-trivial problems, computing the cost associated with a new state is the most time consuming task. Therefore, it is of interest to investigate this part of the algorithm. The key ingredients for an SA algorithm are discussed next.
optimal solution
Figure 3.2: The inuence of problem representation on the cost landscape. better representation, in terms of smoothness of the landscape, is given by . The global minimum of also coincides with the optimal solution . Another representation is even smoother, but its global minimum deviates from . This representation might not be a proper candidate. Whether or not this is the case depends on the amplitude of the deviation . Note that easy evaluation of does not imply that the landscape is smooth (and vice versa). Smoothness of the landscape is, among others, determined by the ordering dened on . Consequently, for different orderings of , the appearance of the landscape changes, but all
28
Optimization Methods
global minima remain intact. Furthermore, for non-trivial problem instances we usually do not know an optimal solution. As a consequence, the de-facto standard way of benchmarking is comparing with best known solutions. Note that the function is also called the cost function in a simulated annealing context, and is called a state. The global optima of the cost landscape are solely dened by the cost function. Furthermore, the shape of the cost landscape is to a large extent determined by the set of perturbation operators and the generation function dened within the simulated annealing enviroment; they dene the local optima [35]. The previous statement can be explained by observing the fact that the perturbation operators and generation function determine whether or not a given state is optimum relative to its neighbors. In order words, the perturbation operators and generation function determine the neighbors of a state and thus an ordering on , and consequently dene all non-global local optima. Both of these ingredients will be discussed shortly.
29
(3.3)
which is also called the Metropolis criterion. This has been the standard way of dening the acceptance function since the introduction of simulated annealing for optimization [25]. The generation function generates a new state from the current state. In its simplest form, the generation of a new state is independent of the current state and the current temperature. In its most sophisticated appearance, many optimization parameters can be involved. For instance, the current state, some kind of estimation of an error gradient, the temperature, etc. [36]. It should be noted that, typically, the choice of a certain generation function feature is based on heuristic grounds. Moreover, problem-dependent tuning is required in these cases. The generation function has been subject to many modications. The exact appearance of this function is closely related to the representation (the state space) of the problem and the set of perturbation operators dened on it. It is loosely coupled with the cost function, and normally the generation function is not modied when the cost function is. An important requirement for the generation function is that it allows traversal of the entire state space. A good rule of thumb is, in a probabilistic sense, to allow traversal of the state space in a small number of steps at high temperatures, and to lessen reachability of states when the temperature decreases. Ultimately, the latter is similar to a local search strategy.
(3.4)
where is a cooling constant which determines the rate of cooling, is an iteration index which can be associated with discrete time, and is an initial constant temperature. The simple temperature schedule of (3.4) ignores in principle all problem-related aspects such as the irregularity of the cost landscape and lacks solution quality awareness. As such, it is not very robust and generally it needs tuning for each problem instance in order to obtain acceptable results. In standard simulated annealing [25], the temperature can only be decreased according to a certain schedule. Other, more general, schedules exist but their general applicability is not known. It is worthwhile to note that non-monotone schedules are certainly worth investigating motivated by promising results on a few (small) problem instances [32]. The only provably good temperature schedule, as yet, is due to Hajek [37]. He proved that algorithm basic simulated annealing () is guaranteed (in a statistical sense) to nd an optimal state , i.e. one with minimal cost, when the cooling schedule has the following shape
where
(3.5)
30
Optimization Methods
Another often used temperature schedule is due to Huang and Sangiovanni-Vincentelli [38]. In this scheme the temperature decrement is calculated such that the slope of the observed annealing curve follows an assumed ideal annealing curve in which the average cost of congurations decreases by an essentially constant amount measured against a scale. The derived expression is
(3.6)
where is the standard deviation of the cost seen at temperature , and is a positivevalued tuning factor which modies the rate of cooling; with a typical value of .
31
the smoothness of the cost landscape; the smoother the better. Since the smoothness is determined by the cost function, the problem representation and the perturbation operators, a great amount of attention is required to choose them well.
32
Optimization Methods
Chapter 4
the conceptual simplicity of the simulated annealing algorithm, the robustness of the simulated annealing algorithm, the versatility of simulated annealing with respect to the type and extensiveness of problems and their constraints that can be handled, the ease of formulating a layout problem in terms of simulated annealing ingredients, the reported effectivity of simulated annealing with respect to layout problems in current literature.
1 In connection with simulated evolution one should read tness function, and in connection with simulated annealing one should read cost function. Moreover, maximizing tness is equivalent to minimizing cost.
34
In the next sections we explain how the layout generation concepts are integrated into the overall simulated annealing optimization framework. We present a ow in which concepts which will be claried in later chapters, are briey touched upon in order to facilitate explanation of the integration of these concepts within the global framework. The main concepts are placement (Chapter 6), routing (Chapter 7), and physical phenomena such as crosstalk, parasitics and process variations and their impact on layout generation (Chapter 8). The aforementioned concepts are formulated in a novel incremental approach which is one of the main contributions of this work.
35
initialization
sequence pair
module info
pin info
technology
module expansion
module info
detailed placement
adjust temperature
no detailed routing
yes
final layout
Figure 4.1: Flow of the simulated annealing approach incorporating placement and global routing. step in the sequence which is the detailed routing step. If the choice is made to continue placement optimization, then subsequently the temperature is adjusted according to a predened temperature cooling schedule. Next, the cost function is evaluated and, depending on the outcome, the current placement is accepted or rejected.2 If it is rejected then the previous placement becomes the current placement and we proceed from this point. If the current placement is accepted then we perturb the placement (by perturbing the sequence pair place ment representation) and compute a new placement. This loop it iterated until evaluates to true. Detailed routing is then performed, which is assumed to be possible by virtue of proper previous optimization steps, and a nal layout is generated.
2 We
assume that the optimization process yields only yes evaluations during the rst iteration loop.
36
4.2.1 Placement
Based on reasons which are given in Chapter 6 we use the sequence-pair placement representation. Basically, we can state that the sequence pair (SP) representation ts well into an iterative optimization framework where (small) changes are applied to the placement during each iteration. Furthermore, the SP structure has advantageous properties in the context of mixed-signal layouts, such as a general non-slicing structure and low global sensitivity to small local changes. Also, important issues such as matching constraints [45], range constraints [46], boundary constraints [47], and interconnect constraints, can be incorporated into a sequence pair formulation. Formally, an SP consists of a pair of sequences [48]:
and
Every sequence is a permutation of the set of integers , where is the number of modules to be placed. Consequently, the sequence pair solution space contains elements. As a result, changing a placement comes down to changing the associated permutation. In the optimization framework, the placement of blocks is split into three parts: a relative placement part, an absolute placement part, and a detailed placement part. This way, the placement problem can be handled more efciently at an abstract level while at the same time allowing a clear graphical interpretation of what is going on during optimization. The latter property leaves a window open for the designer to obtain insight into the procedure and tune the algorithms and (intermediate) results.
4.2.2 Routing
In order to handle complexity, the routing approach is split into separate steps. We adopt a two-step approach: a global routing step followed by a detailed routing step. The reasons for this choice are twofold. 1. It is too costly to compute the entire detailed routing information during each optimization iteration. Furthermore, it is intuitively clear that computing very detailed routing information is a waste of resources when the placement is not even close to being nal.
37
2. It is doubtful whether a single-step approach can yield good solutions in a reasonable amount of time for larger problem instances, as contrasted with a two-step approach. Furthermore, a coarse rst step can yield enough information to guide both the second renement step and a possible local adjustment in placement if necessary. Global Routing From a placement, a set of rectilinear wires connecting all modules in a net can be computed for all nets. The accuracy and associated computational effort can be traded off against each other. Global routing serves two main purposes in layout generation, both of which are especially meaningful in the context of mixed-signal designs: 1. All modules should be connected in such a manner that all constraints on the pins in a net are met; otherwise the placement is inadequate. For the sake of simplicity, we assume that a Steiner minimal tree connecting all pins in a net implies adherancy to the previous condition. 2. Enough routing space should be reserved for detailed routing along the sides of the modules. At least the minimum amount of space can be computed using global wiring information in addition to pin, net, and design rule information. Furthermore, the requirement to minimize performance degradation due to crosstalk between adjacent wires belonging to different nets, increases minimum spacing. In almost all integrated placement and routing approaches, only the rst item is considered. And in virtually all cases a very crude global routing approach is taken. A de-facto standard routing estimation methodology is minimal bounding box (MBB), or half-perimeter, routing. The main reason for doing so is ease of implementation. However, apart from the fact that a coarse routing yields, by denition, routing estimations with large deviations from an optimal routing solution, we also observe the following fundamental disadvantages:
No (performance-driven) wire spacing and routing space estimation can be employed due to lack of spatial information. The coarse routing values might actually conict with (near) optimal routing values in the sense that the former might indicate that a certain placement induces a better routing while it is actually worse. An important consequence is dramatical deterioration of optimization results and, likely, optimization convergence.
As a result, an accurate global routing methodology is proposed here, based on sparse routing graphs and fast and efcient Steiner minimal tree approximation heuristics. Chapter 7 elaborates extensively on global routing. Detailed Routing When problem instances become large, it is infeasible for a single-step routing approach such as classical area routing to determine exactly the spatial properties of each wire for all the nets in one sweep. The problem needs to be made manageable somehow. A hierarchical or multistep approach is a common way to manage complexity. Within an iterative framework, a multi-step approach is particularly advantageous because of the fact that computation time
38
can be reduced by early detection of low-quality placements for which no adequate routing can be found. Another advantage of a multi-step approach, which in our case resolves to a two-step approach consisting of global routing followed by detailed routing, is that detailed routing in itself is a very hard problem which can be mitigated by a priori obtained global routing information. Although a solution for detailed routing is not proposed in this thesis, its relevance to high-quality mixed-signal layout generation should be clear.
-swap( ): interchange elements and in sequence . -swap( ): interchange elements and in sequence .
39
, in clockwise direction.
) or -axis (
).
Perturbation operators P1, P2, and P3, form a complete set in the sense that within a nite number of steps, any sequence-pair conguration can be obtained from an arbitrary starting solution. We state this more precisely. From permutation theory we know that every sequence, which is a permutation of elements, can be written as a product of disjunct 2cycles which we call a swap. Furthermore, this product is unique except for the order of the swaps. Lemma 1 Given two arbitrary permutations and of all elements in . Exactly swaps are needed to go from conguration to (and vice versa), in the worst case. Proof The sufciency condition follows from the fact that there is a swap that will put at least one element in the right place. That element is left untouched afterwards. Furthermore, the last swap will necessarily put the last two elements in place. Thus, we never need more than swaps. The necessity condition follows from inductive reasoning. It is easy to see that
holds and is minimal, i.e. the left-hand side cannot be represented with less than two swaps in the worst case, and these swaps are unique. Here represents concatenation, and is a swap of elements and . Let
(4.1)
be a minimal swaps representation of a permutation with elements. For ously holds. For a permutation with elements we can write
this obvi(4.2)
Thus, for a (worst-case) permutation of elements we need at least swaps. Therefore, for a permutation of elements we need swaps, in the worst case. Theorem 1 From a given arbitrary sequence pair , we can create any other sequence pair using at most perturbations from the perturbation set .
Proof Applying perturbation P1 (P2) on sequence ( ) guarantees nding ( ) within swaps with the aid of Lemma 1. In principle, perturbation operator P3 is redundant since it is a concatenation of P1 and P2. However, P3 is an intuitively attractive perturbation operator which is fully symmetrical.
40
Moreover, it helps reducing the diameter of the search space because typically the number of required swaps is lessened with the addition of P3. Perturbations P4 and P5 do not change the sequence pair. Perturbation P4, however, inuences the absolute location of modules, while P5s only purpose is for minimizing wire length.
random
where random is a random number generator which generates a real value between 0 and 1 with a uniform distribution. As a consequence, at high temperatures almost all cost increases are accepted, effectively turning the algorithm into a random walk. At low temperatures mostly only cost decreases are accepted. The generation function (also called the selection function) is taken to be the identity function in the proposed framework, adopting the standard approach that is suggested in [35]. Although no attention is given to ne-tuning this parameter, it should be noted that its impact may be signicant as it embodies the effective solution space sampling behavior. In other words, the generation of moves which are going to be rejected with high probability is inefcient, thus avoiding such generation is efcient. Of course, this is only practically effective when such moves can be identied relatively quickly, for instance by means of a distance association.
41
initial temperature The initial temperature should be high enough in order to guarantee independence of the nal solution with respect to an initial solution. temperature decrement The temperature decrement can be deterministic or stochastic. As yet it is unknown what type of temperature decrement yields best results. nal temperature The nal temperature can be xed or dynamically computed as a function of several optimization parameters such as the estimated standard deviation of the cost.
In our optimization framework we adopted the strategy of Otten and Van Ginneken [35, Chapters 8 and 11].
in the light of the knowledge that it is always possible to improve performance by tuning, and by virtue of the fact that our goal is to demonstrate feasibility of concepts.
(4.3)
where , , and are user-specied weight factors between 0 and 1 that determine the relative importance of each term in the cost function. The normalization constants , , and are determined in such a way that the weight factors have equal importance. Furthermore, stands for chip area, stands for wire length, and stands for coupling impact. The cost function given by (4.3) is a generalization of the de-facto standard cost function used in literature, where all normalization constants are typically set to unity.
42
computation times and worse solutions. For this reason, it would be intuitively better to use a single cost term
in which no inherent conict is apparent, and which captures the essence of the designers specications.
For instance, this could be accomplished by taking only the total wire length into account, since short wire length normally means that blocks are placed close together. In cases where various placements of blocks exist with the same total wire length, the placement with smallterm est chip area should be taken. This could be accomplished by using an additional with a small weight value . Another way is to translate wire length into wire area and implicitly incorporate this into the total chip area by expanding the modules in the placement. This approach is already quite sophisticated. For the sake of comparability with other published results, we will adhere to the general approach in which essentially both chip area and wire length are taken into account. However, experimental results with a single cost term are also given in Chapter 7.
Chapter 5
niteness: the algorithm stops after a nite amount of steps; correctness: the output of the algorithm complies with the pre-specied post-condition; efciency: the number of primitive computer operations used to accomplish the desired mapping is as small as possible (within the limitations of the employed data structures).
44
(5.1)
Similarly, the -operator (Big Omega) and -operator (Big Theta) are dened:
(5.2)
45
(5.3) In words, the above relations indicate that if , is bounded from above by multiplied by a suitable constant, when is sufciently large. Furthermore, if , is bounded from below by multiplied by another suitably chosen constant, for sufciently large . It is not difcult to see that for any two functions and , if and only if and .
Graphic examples are shown in Figure 5.1. Note the abuse of the equality sign to denote member of a set. It is a standard convention in asymptotic analysis.
Worst-case analysis is by far the most used analysis approach. An important reason for using worst-case analysis is that the occurrence of a problem instance that induces worst-case behavior, might be disastrous. Maybe an even more important reason is the fact that worstcase analysis is typically far more easier to perform than other types of analyses. However, if the worst-case situation does not occur often, the analysis results might be deviating severely from a more elaborate type of analysis. Average-case analysis is concerned with the average computational complexity of an algorithm for a specic set of inputs. This type of analysis is most accurate, but also very difcult to perform in practice. Moreover, the analysis results depend on the assumed distribution of input problem instances. To simplify the analysis, often a uniform input distribution is chosen. Unfortunately, this may not always be a good assumption. In amortized analysis, the time required to perform a sequence of data structure operations is averaged over all the operations performed. This type of analysis can be used to show that
46
the average cost of an operation is small if one averages over a sequence of operations, even though a single operation might be expensive. Amortized analysis differs from average-case analysis in that probability is not involved in the sense that no assumptions about the input distribution are made; there is only averaging over time. The averaging occurs over a worstcase sequence of operations. For more information, we refer the reader to [34].
is located; location;
neighbor nding: return all rectangles that touch a given side of a given rectangle; insert rectangle: insert a rectangle of given width and height at a given delete rectangle: delete a rectangle from a given
position;
47
down
area search: check if there are any rectangles of a certain type in a given area; area enumerate: enumerate all rectangles of a certain certain type in a given area.
A striking feature of the corner stitching data structure is its ability to represent both empty and non-empty regions in the plane. This notion can be generalized to more than two types of rectangles if necessary, without incurring any performance loss in terms of computational complexity. In fact, the corner stitching data structure is a generalization of the doublylinked list data structure to two dimensions, where each list item covers a part of the plane. Figure 5.3 shows an example of a set of rectangles in the plane represented by the corner stitching data structure. Notice the white area which represents unoccupied space, whereas the shaded area represents occupied area. The corner stitches are shown explicitly in the rectangular dashed-outline region. When browsing through the data structure, these corner stitches are used to go to a neighboring rectangle. From this gure it is also clear that examining physically close rectangles is a local operation which can be performed very fast. Another feature of the corner stitching data structure is a property called maximally hori-
Figure 5.3: A placement of rectangles in the plane explicitly represented using the corner stitching data structure. zontal empty tile. This means that an empty tile is always maximally extended in horizontal direction, which is also shown in Figure 5.3 where the white unoccupied area is split into maximally horizontal (empty) rectangles. The corner stitching data structure performs well in practice due to its relatively simple structure. However, its implementation requires a lot of care to avoid some tricky pitfalls. Its
48
actual performance can vary quite a lot. For example, inserting a set of rectangles in the plane will be performed faster if less segmentation of the plane is induced around a rectangle during insertion. Thus, the order of insertion plays a role. Typically it is more advantageous to place the larger rectangles rst and then the smaller ones. The rationale behind this is that large rectangles can shield a larger portion of the plane from other parts so that less interaction is required. The reader is referred to [52, 15] for more information. We conclude by giving Table 5.1 of relevant corner stitching operations. From this table we can see that a few operations Table 5.1: Corner stitching operations and computational complexities. A hint is an auxiliary pointer to a proper object in the data structure. operation av. comp. av. comp. complexity compl. with hint insert () delete () neigbor enumeration () area enumeration point nding can typically be performed in constant time, independent of the number of items already inserted into the data structure. Although most operations have a worst-case complexity of , where is the total momentaneous number of rectangles in the plane, this occurs seldom in practice. Normally, searching for a certain module in the data structure requires a considerable amount of effort; on average. Typically, the actual number of relevant rectangles is equal to (selected area) (5.4) (total rectangle area) It is also shown in the table that a hint, which is an auxiliary pointer to some object (empty or non-empty) in the data structure, can signicantly improve average computational complexity. Of course, the strength of a hint is actually unleashed when it is chosen in such a way that it provides maximum gain. In practice this means, that computing a hint should be performed much more efciently than the average complexity of the operation without a hint. A worst-case sequence of operations will provide more insight on the issue where the break-even point lies. If we know in advance the (approximate) maximum number of objects which are going to be stored within the corner stitching data structure, a hash table can be used to improve some of the average complexities without a hint. The approach is as follows. Create a hash table which can store the objects indexed by their, say, bottom-left coordinates. As this implies each existing object can be found in constant time, the operations that involve a specic object to be found prior to performing the actual operation, can be decreased in complexity.
49
each other in a sequential one-directional fashion. A somewhat more sophisticated list is the doubly-linked list where the data items are connected to each other in a bi-directional way. Conventional lists are useful when dynamic sets need to be maintained, and the primary operations are insertion of an element and deletion of an element. Also, enumeration of all elements in the set can be performed efciently. Lists are not efcient when a specic element needs to be looked up in the set because every element before that element in the list has to be looked at. For the lookup operation we need time in the worst case. Unfortunately, this is also the average-case computational complexity. Table 5.2 contains computational complexities of list operations. Note Table 5.2: List operations and computational complexities. operation computational complexity insert () delete () nd () enumerate that deleting an arbitrary item requires a nd () operation before the actual deletion. Only deleting an item with a known location, for instance at the head or tail of a list, can be done in constant time. The same holds for inserting an arbitrary item at the head or tail of the list. Note that operations on items with a known location in the list, can be performed in constant time with the aid of a hash table. Of course, this approach is only useful when the maximum number of items in the list can be estimated beforehand and this number is much smaller than the universe of storable items. Recently, a more powerful variant of the list-based data structure has been introduced by Pugh [53]. It is called skip list and it was proposed as an efcient alternative to balanced trees. The key ingredients in a skip list are: a logarithmic number of levels containing data items and a probabilistic approach to skip pointers. Skip lists appear to have very good performance in practice and can do whatever a balanced tree can do, and that at least as fast. Where balanced trees become inefcient when objects are frequently inserted and deleted from the set, skip lists take over by avoiding expensive re-balancing operations after each modication of the set. Last but not least, skip lists are easy to implement. Figure 5.4 shows the basic notion behind the skip list data structure. In Figure 5.4(a) a conventional linked list is shown. In order to reduce searching time, an additional pointer is introduced with every other object. Each such pointer skips one object. The result is that searching time is reduced by half. This idea can be applied to every fourth ( ), every eighth ( ), every sixteenth ( ) pointer, and so on. Generally the maximum number of pointer levels is chosen . It is now clear that each element can be found in time using classical binary search principles. However, inserting or deleting an item, while maintaining the skip list properties, can be very awkward. This problem is solved by Pugh using a probabilistic approach. The skip list data structure can still degenerate into a linked list, but that probability is utterly small for any reasonable size of . Table 5.3 shows the computational complexities of skip list operations. In a development environment, however, it may be desirable to exactly reproduce results or to compare results after only one specic setting has been changed. With a probabilistic data structure it might
50
3 5 7 13 21 34
(a)
(b)
NULL
13
21
34
55
89
144
233
(c)
Figure 5.4: The structure of the skip list is essentially a generalization (b) of the linked list structure (a). If probability is added, a more irregular structure is obtained (c), but typically it is very efcient for insertion, deletion and searching. Table 5.3: Skip-list operations and computational complexities. operation av. computational complexity insert () delete () nd () successor () predecessor () enumerate be troublesome to judge the impact of a change when also the data structure performance changes. Therefore, a deterministic algorithm might be preferable under these circumstances.
51
(a)
(b)
Figure 5.5: An example of a binary search tree in (a) a classical tree representation, and (b) an equivalent full binary tree representation used in [54]. a linked list, and performance plummets. A great amount of work has been spent on nding tree-balancing algorithms and techniques to overcome the effect of degeneration. The result is a colorful set of balanced binary tree algorithms: B-tree, AVL trees, red-black trees, randomized binary trees, splay trees, and many more. The splay tree data structure is a very efcient data structure in that it has amortized computational complexity per operation, where the time per operation is averaged over a worst-case sequence of operations. Essentially, each splaying operation, which is a simple restructuring heuristic, resembles a move-to-front technique of the splayed item plus a shortening of the height of the current tree. Exactly three different splaying cases can occur. These cases are shown in Figure 5.6. To splay a tree at a node , we repeat the aforementioned primitive splaying operations until is the root of the tree. Splaying a node at depth takes time [54], that is, time proportional to the time to access node . Splaying not only moves to the root, but roughly halves the depth of every node along the access path. This halving effect makes splaying efcient. Note that splaying, and consequently a splay tree, is fully deterministic. It is clear that under some conditions of access, insertion, deletion probabilities over the universe of elements, the splay tree data structure can perform substantially better than in the worst case. From experiments [55] and experience in the eld, especially with respect to randomized binary search trees [56], splay trees typically outperform other balanced tree implementations. Therefore, we have chosen splay trees as our primary balanced tree data structure for implementation. Table 5.4 shows the computational complexities of splay tree operations.
52
(a)
(b)
(c)
Figure 5.6: All three splaying cases. Each case has a symmetric variant which is not shown. The accessed node is . (a) Zig: terminating single rotation. (b) Zig-zig: two single rotations. (c) Zig-zag: double rotation. Table 5.4: Splay-tree operations and computational complexities. The worst-case and average-case complexities are equal. operation computational complexity insert () delete () nd () successor () predecessor () enumerate table is . In fact, a hash table is a generalization of an ordinary array in which direct addressing is
53
performed in a clever way. A hash table becomes especially interesting if the number of keys to be stored at any time moment is small compared to the size of the key space. Instead of using the key directly to access a position in the array, the array index is computed from the key, which is called hashing. This way, the size of the array can be kept proportional to the number of keys instead of the size of the key space, as is the case for ordinary array storage. Figure 5.7 graphically shows the principle of hashing. Keys from the universe of keys are mapped to the arrary using a hash function , with . Due to the fact that the size of is much smaller than the size of , and the hash function is not perfect in the sense that it does not know in advance which keys from are going to be stored in , collisions can occur. That is, some keys will be mapped to the same slot position in . An efcient way to resolve collisions is by means of chaining, i.e. keeping colliding keys in a list. For the shown example, keys and collide and are chained.
(universe of keys)
(used keys)
NULL Figure 5.7: The principle of hashing where collisions are resolved by chaining. The essential elements for an efcient hash table implementation are: the hash function, and the capacity handling of the hash table. A hash function
(5.5)
is said to hash an element , where denotes the key space, to slot in the hash table. Since hashing is performed during every hash table operation, the hash function needs to evaluate quickly and have good distributing properties. Knowledge on the probability distribution of the input elements will facilitate the construction of a good hash function. Typically, heuristic techniques are employed for this purpose. In the case where not much is known about the input distribution, except for the fact that it is quite unpredictable, general approaches can be taken. A common technique is the division method, yielding for instance (5.6) where should preferably be a prime number at least as large as the number of slots in the table, and not too close to exact powers of 2. Instead of mapping a single key, it is also easy to map a pair of keys , which can also be interpreted as a point in the plane, into a hash table. The notion of double hashing ts
54
and
(5.7)
where is prime, is a positive integer smaller than , for instance . Because generally we do not know which elements of are going to be stored in the hash table, by denition so-called collisions will occur. An effective way to handle collisions is by means of chaining. The chaining principle essentially turns each slot in the hash table into a linked list. If the size of the hash table is well-chosen, the expected length of a chain is very small and does not depend on . Regardless of the fact whether or not collision resolution is employed, the number of slots in a hash table needs to be large enough to avoid deterioration of performance. If the set of elements that is going to be stored, is known in advance, a so-called perfect hash function can be computed which guarantees a one-to-one mapping in the hash table. Of course this is a trade-off between performance gain by avoiding collisions and effort needed to compute a perfect hash function. Table 5.5 shows the computational complexities of hashing operations. Collision resolution by chaining is especially attractive when the number of keys in the hash table approaches the number of slots in the hash table; inserting a new key always takes . However, when the number of keys in the hash table grows signicantly larger than the hash table size, then nding and deleting a key is performed proportionally slower. If a truly dynamic set has to Table 5.5: Hash table operations and computational complexities. Note that is the number of slots in the hash table and not the number of elements currently inserted in the table. operation av. computational complexity insert () delete () nd () enumerate be maintained and the number of items in the set is unknown in advance, then a hash table is likely not the best choice.
insert
55
minimum which returns the element of with smallest key, extract min which removes and returns the element of with the smallest key.
A data structure with the aforementioned properties is called a priority queue. One application of priority queues is to schedule jobs on a shared computer. It also has great utility in VLSI design problems. Most (practical) implementations of efcient priority queues have amortized computational complexity bounded by , where is the momentaneous number of elements in the queue.
56
Chapter 6
Placement
When a circuit has been designed in terms of a netlist connecting (properly sized) building blocks, the layout phase is next to follow. This part of the design cycle is called physical design and for contemporary mixed-signal designs this phase is becoming increasingly more important. In fact, it is a dominant limiting performance factor of any state-of-the-art integrated circuit. Two important issues in physical design are placement and routing. This chapter focuses on the placement problem. First we dene the placement problem. Then we give an overview of several approaches to solve the placement problem. Based on our requirements on placement quality and on placement-related issues such as substrate coupling and matching, a choice is made regarding the approach for tackling the placement problem. We will elaborate on an efcient placement representation, which is known as the sequence-pair structure. Its theoretical properties are discussed in detail. Moreover, we unify new ndings with known theories and algorithms. Theoretical fundamental lower limits on computational complexity are given with respect to state-of-the-art approaches to placement computation using the sequence pair representation. Motivated by promising theoretical results, an incremental placement computation approach is devised which has very attractive features in a simulated annealing optimization environment. Experimental results are shown to demonstrate the effectiveness and efciency of the incremental approach. We proceed by discussing an important extension to standard placement, which is constrained module placement in which modules can be constrained to a prescribed location in the plane or forced to be placed at one of the chip boundaries. An improved robust approach is proposed and its effectiveness and superiority over latest published works is demonstrated by experiments. Let us rst specify more exactly what is meant by a placement. Denition 6 (Placement) A set of given rectangular blocks which are placed in a two-dimensional plane, is called a placement. Since no restrictions are put on possible overlap of blocks, clearly not every placement is practical. Therefore, a feasible placement is dened here as follows. Denition 7 (Feasible Placement) A placement in which no overlap of blocks occurs, is called a feasible placement, otherwise it is called infeasible.
58
Placement
The blocks that are used in a placement are normally of xed size, but it is also possible to take blocks with exible sizes. Those exible blocks, also called soft blocks, can be taken from a given set of candidate blocks, under an aspect ratio constraint, or some other mathematical function constraint. Here, the placement problem is dened as follows.1 Problem: The placement problem Instance: A set of blocks of given sizes. A set of pins of which a subset is at the circumference of each of the blocks, representing the connectivity information between the blocks. An objective function , which, for instance, captures the total length of interconnecting wires and/or the area of the smallest enclosing rectangle around all blocks. All feasible placements, with all possible orientations of the blocks. .
Solutions: Minimize:
The classical term oorplanning is strongly related to placement in that it also deals with placement of objects. Only, the approach of oorplanning is different, because it divides the two-dimensional plane into rooms which are big enough to hold all (exible) objects. This way, overlap is avoided by construction. Moreover, empty area is not explicitly represented by a oorplan. Before proceeding, let us dene a oorplan. Denition 8 (Floorplan) A oorplan is a data structure that captures the relative positions of non-overlapping objects that fully cover a certain rectangle in the 2-dimensional plane [60]. The above denition is a sensible special case, in the current context, of the general denition which was re-coined by Otten recently [60]. Consequently, This notion of a oorplan is similar to relative topological placement representations which can be found in many recent works, e.g. [61, 62, 63, 64]. In this respect, oorplanning can be compared with feasible placement computation using a topological placement representation. The main difference is that typical (feasible) placement computation deals with xed-size blocks. When instead of xed-size blocks, variable-size blocks (also called soft blocks) are used, the placement problem is generalized into a oorplanning problem. The result of a oorplanning phase is a sized oorplan. The latter is dened as follows. Denition 9 (Sized Floorplan) A sized oorplan is a oorplan in which each room contains exactly one block, and the block is not larger than the room. Note that the word oorplan, instead of sized oorplan, is also used in literature to denote the result of a placement phase which contains absolute position and size information. Hereafter, the term placement is used to denote a sized oorplan, even in conjunction with soft blocks. Formally, the oorplanning problem can be dened as follows.
1 We
59
Problem: The oorplanning problem Instance: A set of exible blocks, and a sizing (or shape) function that selects a shape alternative for each block. An object function , which, for instance, captures the total length of interconnecting wires. All sized oorplans, with all possible combinations of shape alternatives, and all possible relative topologies. .
Solutions: Minimize:
We will classify placement representations in slicing and non-slicing. The reason for this classication is the obvious difference in generality. Figure 6.1 shows an illustrative example of a non-slicing placement, which is dened as follows. Denition 10 (Slicing) A placement is slicing if and only if it can be obtained by complete recursive bisection of the placement area. If slicing cannot be recursively continued up to the lowest level, a placement is called non-slicing .
Figure 6.1: An example of a non-slicing placement. The main incentives for using slicing representations over non-slicing representations are the following.
Some placement-related problems which are NP-hard for non-slicing placements, can be reduced to polynomial-time problems for slicing placements. Several useful properties can be attributed to slicing placements, of which conict-free channel routing sequence application is most prominent. A hierarchical design methodology matches well with the slicing oorplan methodology.
Hence, it is clear that both slicing and non-slicing representations have advantages and disadvantages. In the following sections we argue that the so-called sequence pair representation is most suitable for use in a mixed-signal layout generation framework.
60
Placement
to this complexity it is impractical trying to nd solutions of any but the smallest problem instances. It is not our intention to give an exhaustive overview. Firstly, the amount of published literature is too large to describe extensively in this thesis. Secondly, it would lead us too far off the purpose of this section, which is to discuss candidate placement approaches. We refer the reader to good overviews in [14, 15] and the references therein. Our purpose of using the phrase placement approach instead of placement algorithm is that the former is more generic. For instance, a placement could be obtained using a forcedirected method with a general (non-slicing, overlap-allowed) representation of blocks. The placement algorithm is then clearly the force-directed method, but the placement approach is the general representation of the blocks which is employed while placing using the forcedirected method. Another combination could be to use the force-directed method with a slicing placement representation. Since the representation of a placement has great impact on the performance of a placement algorithm, both in terms of speed as well as solution quality, it is sensible to discuss this in more detail. Otten [66] was among the rst who introduced the notion of oorplanning in the early eighties. Motivated by this concept, researchers have begun to look for special cases which could be applied to digital VLSI circuits, without limiting design freedom in a negative sense. One of the most prominent special cases was the slicing oorplan structure, for which certain intractable problems reduced to polynomially solvable cases. This important property has been the main reason for using the slicing oorplan approach. An efcient oorplanning approach is described by Wong and Liu in [67]. Initially, the slicing oorplan approach was also applied to analog designs. However, it was soon realized that the slicing structure is too restrictive for analog layout [7]. Consequently, a more general placement approach was adopted by members of the analog layout design community. Actually, the most general placement approach of all was initially used for this purpose; blocks were allowed to be placed at arbitrary positions in a 2-dimensional plane. Thus, the representation allowed overlap of blocks. One of the rst works in this respect is due to Jepsen and Gelatt [68]. Subsequent works, which extended and rened the original concept, are due to Sechen [41] and Lampaert [8]. Although the general overlapping placement approach resulted in promising results, fundamental aws of it prevented researchers from building a viable mixed-signal layout generation system for larger designs. Efforts to rene implementations and tune the layout system to improve performance have been and can only be successful up to an extent. Fortunately, a great deal of research effort has been put into the design of efcient non-slicing placement representations. Murata et al. [48] developed one of the rst efcient general placement representations, called the sequence pair structure. Some other relevant works are due to Nakatake et al. [63] who developed the bounded slice-line grid structure. Very recently, the O-tree structure was introduced by Guo et al. [62] and independently by Takahashi [64]. A host of extensions and renements of the original O-tree concept followed rapidly [69, 70]. These representations were soon adopted by others for use in an analog layout generation system [45].
61
the solution space is nite, every solution is feasible, the mapping of a representation into a placement can be performed in polynomial time (P), the solution space contains an optimal solution (admissible).
The rst requirement is quite weak because niteness can have a near-innite appearance [71]. Requirements two and four are obvious. The third requirement is also quite weak, since polynomial computational complexity includes a linear algorithm, but also an algorithm, where can be a large constant. As a consequence of the rst and the third requirements, we can distinct various representations within the boundaries of P-admissibility. The computational complexity associated with a complete placement representation is a combination of essentially two properties of the representation: 1) solution space size, and 2) computational complexity for computing a specic placement. Since scalability, i.e. the computational behavior of a system as a function of the input instance size, is becoming increasingly more important, the use of asymptotic complexity measures is fully justied. In order to make a proper choice on which type of placement representation to use, it is wise to create an overview of important and relevant representations. The nal choice of placement representation is made based on a trade-off between
Let us rst restate the requirements for an efcient mixed-signal layout generation tool from a conceptual point of view. First of all, it is well-known that matching of both wiring and modules is extremely important in analog circuit layout. Therefore, representations that set restrictions to generation of matching-aware layouts should not be used. Thus, non-slicing placement representations are more suitable. Second, the system should have good scaling properties, which means that the computational complexity should be as low as possible. Moreover, in the light of an optimization algorithm which is going to be employed to compute a (near) optimal solution, preference might be given over a specic type of representation
62
Placement
which can possibly exploit information efciency. Third, to achieve efcient usage of computational power and ultimately obtain a high-quality layout in several respects, better understanding of the mechanisms and parameters that control the overall layout quality is required. Therefore, more insight into the representation, especially with respect to its mathematical properties is of importance. A major benet of identication with known mathematics is the possibility to use a host of existing off-the-shelf techniques and algorithms. Table 6.1 gives an overview of known placement representations. It also shows the size of the associated solution space of each representation. Also, the generality of a representation Table 6.1: An overview of placement representations and their associated solution space size. It is indicated whether or not a representation can represent a non-slicing (NS) oorplan. PE indicates the computational complexity of a single full placement evaluation for a state-of-the-art implementation.
representation at Jepsen-Gelatt Polish expression normalized Polish expression sequence pair (SP) bounded sliceline grid (BSG) ordered tree (O-tree) labeled ordered tree (LOT) B*-tree binary tree corner block list topological relation & orientation NS yes no no yes yes yes yes yes yes yes yes solution space size PE refs. [68] [66] [67]
is indicated in the column with heading NS (non-slicing). From the above table it is clear that there is a big difference in the size of the solution space of the representations. Although an indication is given regarding slicing properties of the representations, it does not cover all aspects of a exible placement representation. This will be further explained in the next section. When a placement representation is used in an iterative approach, it is of utmost importance that the computation of a placement from an abstract representation is very fast. Also, scalability of the placement evaluation step is a major concern [39]. Therefore, the computational complexity of a single placement evaluation (PE) step is also shown in Table 6.1 in the column headed by PE. Obviously, is the best possible complexity when a from scratch computation is desired. Both SP and BSG have super-linear complexities. However, the given values are based on latest published results. Due to the fact that no proof of optimality is known for both the SP and BSG algorithms, we may conclude that improvement is not impossible. Summarizing, we have two types of representations:
general non-slicing representations which have no layout restrictions, and specic restricted representations which have layout limitations; typically these rep-
63
resentations are very efcient, albeit useful only in cases where such a limitation is allowed. In the context of mixed-signal layout generation, restrictions on the layout form a bottleneck. Thus, general non-slicing representations are preferable.
popular computer game in which blocks are dropped down. topological change is a change in relative relationships between blocks.
64
Placement
a c d
(a)
a c d
(b)
Figure 6.2: Two meaningful placements, but (a) cannot be represented using a Otree(-like) representation, while (b) can. similar representations such as O-tree and B -tree). In contrast, a sequence pair placement is shown in Figure 6.3(b). Clearly, the sequence pair representation is much less sensitive to small non-topological changes since the location of block is independent or at most linearly dependent on the dimensions of block .
(a)
(b)
Figure 6.3: The sensitivity of (a) a labeled ordered tree placement is clearly much larger than that of (b) a sequence pair placement. For instance, when module is made a little smaller, module moves to the bottom boundary of the placement area. Summarizing, Table 6.2 gives for the placement representations mentioned in Table 6.1 their major advantages and disadvantages. These results will have a signicant impact on the overall best candidate for placement representation in the context of mixed-signal layout generation. Many of the representations in Table 6.2 have been generalized to placement of rectilinear objects at the cost of increased complexity [74, 75, 76, 77, 78]. Also, range constraints which covers both pre-placed blocks and boundary-constrained blocks have been considered [79, 46, 77, 47]. A few of the representations have been adapted to take into account a very important analog-design-related issue which is called matching [80, 45, 72].
65
Table 6.2: An overview of placement representations with their advantages and disadvantages for mixed-signal layout generation.
representation at Jepsen-Gelatt advantages merging is possible general exible representation Polish expression normalized Polish expression sequence pair conict-free channel routing conict-free channel routing conned solution space general exible representation low sensitivity symmetry bounded sliceline grid general exible representation low sensitivity ordered tree non-slicing placement linear-time placement comp. compl. labeled ordered tree non-slicing placement linear-time placement comp. compl. B*-tree non-slicing representation linear-time placement comp. compl. binary tree efcient no fundamental improvement over underlying representation structure use of known tree-based algorithms corner block list non-slicing representation linear-time placement comp. compl. topological relation and orientation non-slicing representation cannot handle regular structures empty space is not represented extremely large solution space solution space includes infeasible placements large sensitivity very sensitive to small changes very sensitive to small changes quadratic placement comp. compl. larger solution space super-linear placement comp. compl. restricted slicing placement restricted slicing placement disadvantages unbounded solution space
66
Placement
Generally, if adding constraints reduces solution space size, this occurs at the cost of increased single placement computation effort. The nal decision on which placement representation suits us best is based on our requirements, which in order of decreasing importance are:
maximal generality and exibility in order to t mixed-signal and analog issues; low computational complexity to evaluate a placement; a small change in the abstract representation is associated with a small change in placement, essentially implying that the cost landscape is smooth; a small solution space so that searching for a good solution can be done more efciently.
The previous discussion justies the choice of the sequence pair representation for use in a mixed-signal layout generation framework. Details on this representation will be given hereafter.
-axis
above
left of
right of
below
-axis
Figure 6.4: Several conceptual aspects are shown of placement of rectangular modules in the plane space in connection with relative relationships/directions in the sequence pair space . where the grid size is , with the number of modules in the placement space. Furthermore, we dene four disjunct directions, which we call above, below, left of and right of. The rst two directions align with the vertical axis, and the last two directions
-a xi s
s xi -a
67
align with the horizontal axis. For reasons which will become clear shortly, each direction also corresponds to a two-character identier. It is intuitively clear that the grid space only represents relative information between modules. With an additional step, absolute information can be added. Combined, this is sufcient for general placement representation. Now the concept of a packing will be described. Denition 12 (Packing) A packing is a minimum-area feasible placement of rectangular modules associated with a given SP. A packing essentially adds absolute information to the relative representation of the SP. However, there are still a few degrees of freedom left (unexploited) within a packing. Therefore, the Left-Down packing is dened. Denition 13 (LD-Packing) An LD-packing is a packing in which each module is moved left and down as much as possible while preserving the topology dictated by the sequence pair. Except when noted otherwise, each packing is an LD-packing in the remainder of this chapter. We will explain the notion of sequence pair rst by an example and then more formally. An SP consists of two ordered sequences (or permutations) ,
and
where the sequence elements are unique integers from . These integers are identiers of the modules in the placement problem. Wherever convenient, we synonymously use to denote a module. The sequences can be seen as two orthogonal axes that span a 2-dimensional grid-space. An example is shown in Figure 6.5. The ordering of
0
1 2 3 4 5 6 7 8 9 5
6 9 8 7 6 7 9 4 3 0 1 2 0 5
the elements (modules) in both sequences determines the relative relationships between these elements. For each pair of elements we have a before/after relationship within each sequence.
68
Placement
The combination of two sequences yields four relative relationships between module pairs: after, before, below, above. We say that a module is after (or right of) when is located after in both sequences and . A module is before (or left of) module if it is located before in both sequences and . If a module is after module in sequence and before module in sequence , then we say that is below . Similarly, if is before in and after in then we say that is above . This is also clear from the visual representation of an example sequence pair shown in Figure 6.5. For example, module 1 is after module 5, and module 4 is above module 6. From and four sets ( ) can be derived, called the sets4 , which dene the topological relationships right of, below, above, and left of, respectively. The denition of these sets is:
where is the union operator and is the dissection operator. By we denote the element on position of sequence , with a count index starting at 0. By we mean the index of element in sequence . If the upper index value is lower than the lower index value, the union operator gives . Furthermore, we dene . For example, if then and . If, in addition, then . It can be easily seen that the sets contain a lot of redundant information. For example in Figure 6.5, the set tells that elements 8 and 1 are right of element 4. Since the set already tells us that element 1 is right of 8, it is unnecessary to record this information once again in because all relative relations are transitive. In other words, if 1 is right of 8 and 8 is right of 4, this implies that 1 is right of 4. As the sets information is stored for computation or later retrieval [48], it is more efcient to nd a less redundant description. We introduce sets which are derived from the sets as follows:
In essence, the sets are derived from the sets by removing all (redundant) transitive information. If we leave out the index then . represents any of the sets given in (6.5) to (6.8); represents either of the sets given by (6.5) and (6.7); are dened analogously. For simplicity, we will use instead of if no
4 The
69
confusion is possible. Note that the sets and sets are symmetrically related. This also applies to the and sets [48]. Formally this is written as:
(6.9) (6.10)
where . Thus all (relative) topological information is available in two orthogonal (non-symmetrical) sets; for instance the and sets. However, for practical applications it is very useful to maintain all sets, for instance to improve run-time performance. Note that the sets maintain local topological information, whereas the have a global character. An interesting property of the sets is that they are necessary and sufcient to calculate a packing based on constraint graphs [81], under the assumption that we have no a priori knowledge on the sizes of the modules. In practice, we do have this knowledge, but it will turn out that a complete dissection of relative and absolute placement computation is advantageous for incremental computation. This will be explained further on. Note that we are able to represent any packing of rectangular modules with the SP [48, 10], because we can nd a sequence pair for every packing. Another advantageous property of the sequence pair is that it can be uniquely visualized in two dimensions by an oblique grid representation. The -45 degree axis represents sequence and the +45 axis represents sequence . Figure 6.6 shows an example of an SP and its oblique grid representation, denoted by grid hereafter. Furthermore, each module has
0
1 2 3 4 5 6 7 8 9 5
6 9 8 7 6 7 9 4 3 0 1 2 0 5
grid
representation
of
sequence
pair
four so-called views which uniquely correspond to the , , and sets. For example, the view of module 5 is the shaded area in Figure 6.6; . It is clear that every set is a subset of its corresponding set. For example, . Currently, two approaches exist to compute a packing from a sequence pair and a set of modules. The rst approach, the graph-based method discussed in Section 6.5, is based on constraint graphs which contain the topological information given by a sequence pair. Within each constraint graph a longest path is sought. The longest path length that is found
70
Placement
corresponds to the width of a packing (for the horizontal constraint graph) and the height of a packing (for the vertical constraint graph). The second approach, which can be classied as a non-graph-based method, is discussed in Section 6.6. This approach is based on socalled longest common subsequence (LCS) computation [82]. The classical LCS problem is generalized into a maximum-weight common subsequence problem by Tang et al. [83]. A subsequence of a sequence is an ordered subset of the sequence elements in which the original relative order is preserved and adjacent subsequence elements need not be adjacent in the original sequence. We will give an overview of existing material on this topic and establish a few new links in this context, leading to a new lower bound on the computational complexity for computing a packing from scratch. Although the graph-based packing approach does not yield the most efcient packing computation technique in terms of computational complexity, it will yield convenient means to step into an incremental packing computation approach which is described in detail in Sec tion 6.7. The computational complexity of the incremental approach is . The computational complexity of the graph-based packing approach is , on average, and in the worst case. The best known average-case and worst-case computational complexity for non-graph-based packing computation is . Furthermore, in Section 6.11 constrained placement computation is described in detail. The basic idea is to impose spatial constraints on specic modules. For instance, a module can be forced to be placed at the right boundary of the chip area. Details will be given in the following sections.
71
Proof It is easy to see that the transitive edges induced by the constraint graphs constructed from the sets are redundant, since we a looking for longest paths. Only these redundant transitive edges are removed in the sets. Thus, sufciency follows. To prove that the sets are necessary, suppose that they are not necessary and we can do with less. Every element can have its weight increased to make it part of the longest path. Furthermore, the number of times an element occurs in an set is equal to the number of unique paths that include this element. Thus if we remove an arbitrary element from a subset of , then the path containing this element can no longer exist. This path could, of course, be a longest path by proper adjustment of certain weights. This is a contradiction. Thus necessity follows. As a consequence, we can efciently map an SP to its corresponding horizontal and vertical constraint graphs using the sets and the sets, respectively. These constraint graphs only represent the relative relationships. In order to obtain absolute placement information, i.e. all coordinates of all blocks, longest paths computations need to be performed. First we discuss how the relative placement information is computed, that is, we propose an algorithm to compute the sets. After the way this algorithm approaches the problem, it is named the Direct View (DV) algorithm [81]. A few denitions to clarify some terminology are in place. For ease of discussion, the oblique grid is rotated 45 degrees clockwise. Moreover, we associate a quadrant, relative to a module, with each of four possible directions; quadrant 1 up, quadrant 2 left, quadrant 3 down, and quadrant 4 right. Denition 14 (Direct View) A module is said to have a direct view on module in a specic direction, if and only if is in the associated quadrant of and there is no other module in the rectangle spanned by and . Module is called the directly viewed module (or simply viewed module if no ambiguity is possible), and is called the viewing module, denoted by . For example in Figure 6.7, modules 1 and 4 are directly viewed by module 3 when we look to the right. So module 3 has exactly two modules in its direct view to the right. Note that these viewed sets are exactly the sets, and that every viewed set of size induces edges in the corresponding constraint graph. Now that we have shown, in the form of Theorem 2, that the information contained in the sets is necessary and sufcient to represent all relative relationships between modules in a sequence pair context5, it is interesting to investigate the exact size of these sets. Since the size of the sets directly depends on the sequence pair at hand, we need te make an assumption in this respect. Intuitively, it is plausible that during the initial phase of stochastic optimization, no preference is given for a specic type of sequence pair. Therefore, the assumption that random sequence pairs are generated initially is perfectly valid. However, one may object to this assumption and pose that during the nal phase of optimization, the optimization algorithm, which is simulated annealing in our case, converges to a specic sequence pair that may be some kind of worst-case sequence pair. Albeit imaginable, there is no clear reason why a nal sequence pair should exhibit worst-case behavior. Indeed, our experiments in Section 6.9.2 unambiguously show that nal sequence pairs exhibit averagecase behavior. Even in the case where additional constraints are imposed on placement, there is no reason to assume some kind of adverse correlation between the structure of the constraint graphs and the quality of a placement.
5 We
72
0 1 2 3 4 5 6 7 8 9 10 11 12 14 12 5 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 4 6 13 2 9 3 1 7 10 11 0
Placement
14
13
Figure 6.7: Rotated oblique grid representation of SP = ((0, 11, 7, 10, 3, 1, 6, 13, 2, 9, 4, 14, 12, 5, 8 ) , (14, 6, 5, 3, 4, 1, 7, 11, 0, 10, 13, 2, 9, 8, 12 )). The previous discussion justies to take a random grid distribution for analysis purposes, or more exactly, a randomly selected sequence pair from the sequence pair solution space with all elements being equiprobable. To facilitate the analysis, we dene the following. Denition 15 A subsequence of a sequence is an ordered subset of the elements of , where the ordering is with respect to the element positions in . Furthermore, we observe the following. Theorem 3 Each common subsequence of a sequence pair ( , ) is equivalent to a unique strictly increasing subsequence of sequence , which is a unique permutation of . Even so, each strictly increasing subsequence in corresponds to a unique common subsequence in ( , ). Proof A common subsequence of ( , ), where denotes the size of , implies that is both a subsequence of as well as . By construction of the constraint graph, each common subsequence is equivalent to a path through the constraint , which maps the modules graph (from left to right). Dene a relabeling function in such a way that is a strictly increasing sequence. Since is a subsequence is a strictly increasing subsequence of . And since is also a subseof , is also a subsequence of quence of , . Now choose . Since is a strictly increasing sequence, it is clear (from an oblique grid visualization) that a path can only exist in the constraint graph if and only if the nodes on the path occur in strictly increasing order in . Thus the nodes on the path are also in a common subsequence of ( , ), which is easily written as ( , ) using , with .
73
Corollary 1 We can analyze properties of sequence pair ( , ) indirectly by using the simpler singlesequence approach. Denition 16 A maximal increasing subsequence of sequence is a subsequence of which can not be enlarged by adding elements from without violating the monotonicity property. With this denition we arrive directly at the following denition. Denition 17 A longest increasing subsequence of sequence is maximum-cardinality subsequence over all maximal increasing subsequences of . Consequently, a longest increasing subsequence of is always maximal, but not vice versa. Let us denote a maximal increasing subsequence by , and its size by . We state a theorem taken from [84]. Theorem 4 Given a random sequence of length , which is a permutation of distinct integers. The expected length of the longest increasing subsequence is asymptotically . Recapitulating, we want to compute the expected number of edges in the sparsied constraint graphs associated with the sets, under the assumption of uniformly random sequence pair selection. Let us consider the sets (associated with the horizontal constraint graph). The other sets can be treated similarly. Each pair is a directed edge in the constraint graph. So the number of outgoing edges from a node is equal to . What we want to determine is the total number of edges in the constraint graph. As mentioned before, depends on the actual distribution of the modules in the grid, also called a pattern. The average number of edges is denoted by , while the maximum number of edges is denoted by . Note that the average and maximum are taken over all possible grid patterns. Moreover, note that a pattern is equivalent to a permutation (Corollary 1), and that the set of patterns is the set of permutations of elements. Hence, the total number of patterns is if we disregard the node labels. It can be easily veried that the maximum number of edges is obtained with a scenario such as shown in Figure 6.8, which consists of two columns of nodes consisting of nodes each ( is even, without loss of generality). Thus, the grid pattern has
(6.11)
edges. Furthermore, if all nodes are vertically lined up in a single column, then the following holds: where is the set of all grid patterns. This implies
. Thus,
(6.12)
(6.13)
74
Placement
... . ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..... ..... .. ... ..... ... ... ... ... ... ... ... ... . ... ... . ... ... ... . ... ... . .. ...
Figure 6.8: A simple worst-case pattern in a grid. Denition 18 A string is a closed subsequence of a sequence, which is uniquely dened by two elements in the sequence to denote the start and end of the string, respectively. Determining is done in the following simplied way, using Corollary 1. For the example shown in Figure 6.7 we can use a simple linear-time algorithm to dene a mapping that transforms into an increasing sequence . If the same mapping is applied to sequence we obtain the permutation
0 1 2 3 4 5 6 7 8 9 10 11 12 11
,
,
which is shown in Figure 6.9. In this permutation we are looking for strings
0 1 2 3 4 5 6 7 8 9 10
12 13 14 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
14
13
Figure 6.9: The grid representation of Figure 6.7 can be relabeled in linear time to the above pattern, and this pattern can be described uniquely with a single sequence or permutation. and the numbers between and
(6.14)
75
This is equivalent to the notion that the rectangle induced by elements and is empty. In other words, sees , or equivalently, is an edge in the sparsied constraint graph. Consider a random pattern in an grid. It is easy to see that the number of strings of length is exactly . The probability that a string complies to (6.14) is equal to the probability that and are two consecutive elements of the string set. Let us denote this . For example, if the string is then elements 4 and probability by 5 are two consecutive elements and the string complies to (6.14). If the string is then elements 4 and 10 are two consecutive elements. The number of ordered pairs of consecutive elements from a set is exactly . Furthermore, the total number of pairs is . So
where is the length of string minus 1. The expected (or average) number of edges in an as: After rewriting (6.15) with the identity [51]
where
(6.16)
(6.17)
which states the average number of edges in a constraint graph explicitly. We state this result in a theorem. Theorem 5 The expected number of edges in a constraint graph is equal to 6 , if each sequence pair is equiprobable.
which is essentially . This result stimulates us to search for an algorithm which performs about of work per node, resulting in an overall (average) computational complexity of for all nodes in a constraint graph.
6 For simplicity, but without loss of generality, we disregard the edges coming from the source node and going to the target node, wherever convenient.
Theorem 5 implies that no graph-based algorithm exists which has average computational complexity lower than for computing the mapping from sequence pair to a packing (from scratch), under the assumption that all sequence pairs are equally likely. It follows directly from (6.17) that the average number of edges per node in the constraint graph is (6.18)
76
Placement
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
/* right to left scan of the grid */ Find Bracket Pairs Initialize BP POS for to step do
closest below while do if then Update BP POS successor od if then Traces OBP Add if then Traces OBP Del
od /* left to right scan of the grid */ /* create sets from the sets */
Figure 6.10: The Direct View algorithm in pseudo code. The DV algorithm performs a right-to-left scan and a left-to-right scan of the grid to gather enough information to construct the sets. During the scans so-called bracket pairs are used which are dened to be a pair of nodes on adjacent horizontal grid lines. For every bracket pair the second node should lie above or below the rst node for a right-to-left or leftto-right scan, respectively. An opening bracket is associated with the rst node of such a pair, and the closing bracket with the second node. The rst node of a bracket pair is also its identier. For example, in Figure 6.7 the bracket pairs for the right-to-left scan are: [10,7], [1,3], [13,6], [2,13], [9,2], [12,14] and [8,5]. Function Find Bracket Pairs nds all bracket pairs in one vertical scan of the grid, which requires time and space complexity. Function Initialize BP POS holds some accounting information on the closest viewing node associated with a bracket pair. This accounting information can also be obtained in . During the right to left scan of the grid, two trees are maintained which hold the viewed nodes for the next node in the scan. The top-down tree holds all nodes for the view, and the bottom-up tree holds all nodes for the view. On line 8 a conditional statement checks if views ,
77
Figure 6.11: Subroutine for updating the traces in the top-down search tree and bottom-up search tree after adding a node associated with a bracket pair.
1 Traces OBP Del 2 begin 3 delete 4 delete 5 end
Figure 6.12: Subroutine which deletes a node associated with a bracket pair from the top-down and bottom-up trees. which can be performed in constant time. If the statement is true then the appropriate set is updated. Update BP POS updates the accounting information for this bracket pair which takes constant time. This loop breaks when the leaf node of the trace has been processed. Lines 14 through 19 check whether a bracket pair should be opened or closed or not, and calls the functions to update the top-down tree and bottom-up tree accordingly. Analogously the left-to-right scan is performed. After both scans have been performed, the sets can be constructed. In Figure 6.11 and Figure 6.12 the update routines of the traces in the top-down and bottom-up trees are shown. When the DV algorithm nishes, all sets have been determined, and herewith the constraint graphs are also known. Using these constraint graphs, we show next how to compute the absolute placement information.
to derive exact locations of all pins connected to the modules in order to assess the (global) routing quality, estimate the impact of substrate coupling between modules, determine the total chip area.
78
Placement
Since a sequence pair and the derived constraint graphs or sets do not provide absolute placement information in themselves, an additional mapping step is required to obtain absolute placement information from the graph representation. This required missing information to compute the absolute coordinates of the modules in a packing is directly derived from the module sizes. An efcient way to determine the absolute positions of all modules using the constraint graphs and the module sizes, is by means of the longest paths algorithm. This algorithm effectively determines from a given source node all longest paths distances to all reachable nodes in the constraint graph. Since the constraint graph is directed and acyclic, the longest paths algorithm requires complexity for a constraint graph [34]. This can be written as . With Theorem 5 this leads to the result that the lower bound on the average complexity of computing the absolute module positions, using constraint graphs, is . This result is a substantial improvement when compared with the original algorithm by Murata et al. [48], which has average (and worst-case) time and space complexity , where is the number of modules to be placed. In the following, we will show how to compute the absolute module positions from the predetermined sets and the module sizes, by a simple example. For simplicity, we only discuss the horizontal case. The vertical case is similar. The sets, , uniquely dene the horizontal constraint graph, where all outgoing edges of a node are given by . In order to compute absolute positions, every node is assigned a positive value (weight). This value is, for the the horizontal case, equal to the width of the corresponding module. For example, with the sizes of the modules in this example shown in Ta , ble 6.3, and sequence pair Table 6.3: Sizes of the modules in the example placement with the constraint graph depicted in Figure 6.13 and the packing depicted in Figure 6.15.
0 1 44 55 2 36 82 3 91 90 module 4 5 79 36 56 84 6 58 33 7 35 27 8 28 70 9 28 65
13 49
the weighted constraint graph in Figure 6.13 is obtained. For didactical convenience, two additional nodes are introduced: a start node and an end node (both with zero weight). They serve as start point and end point while walking through the constraint graph from left to right. With the length of an edge equal to the weight of the node inducing that edge, the result of this walk is that for each node, the longest-path distance to that node is recorded with the node. In practice, a very efcient longest paths algorithm can be used to compute these distances. The nal distance values are the coordinates of the bottom-left corner points of the associated modules. Note that the distance recorded with the end node is equal to the width of the chip area. In Figure 6.14, both horizontal and vertical constraint graphs are shown for the example sequence pair. After the longest-path distances have been computed for all nodes in the vertical constraint graph, the coordinates of the modules are known and an actual absolute placement is conceived. Figure 6.15 shows the nal packing. Let us annotate the previous notions in a formal manner. The horizontal constraint graph , and the vertical constraint graph is dened by is dened by
79
2 0 0 start 0 5 0 56
36
3 79
91 8 28 1
4 6 56
58
44 end
28 7 35 9 13 0
node distance
end 199
0 0
1 155
2 0
3 36
4 0
5 0
6 56
7 56
8 127
9 91
Figure 6.13: The horizontal constraint graph associated with SP ( )= ((2, 3, 4, 5, 6, 8, 1, 7, 0, 9 ) , (0, 5, 7, 9, 6, 4, 2, 3, 8, 1 )). For clarity, the nodeinduced edge weights are explicitly given. Longest path distances are tabularized below the graph.
end
2 2 3 4 4 start 6 5 8 1
6 end 5
7 9 7 9 0 start 0
(a)
(b)
Figure 6.14: The (a) horizontal and (b) vertical constraint graph associated with the example sequence pair.
, where
80
Placement
2 4
8 5 7 0 9 6
Figure 6.15: The packing of 10 modules with sequence pair and module sizes from Table 6.3. Hence
(6.19)
holds. If an efcient implementation based on an adjacency graph representation [34] is used, the memory requirements and time complexity of the graph operations are for constructing the constraint graphs, where are the edges of the constraint graph. Without loss of generality, we will only consider the horizontal constraint graph hereafter, denoted by if no confusion is likely. Formally, the longest-paths information is described by a longest-paths forest denoted by , associates a module di , where the weight function mension with its corresponding node, and where is a recursive distance function dened by
(6.20)
, or equivalently . The equations given by (6.20) are so-called Bellman-Ford equations. Due to the fact that the constraint graph is directed and acyclic, the set of equations given by (6.20) can be solved uniquely, by performing an ordering step (depth rst search) followed by a [34]. Furthermore, it is clear that relaxation step, both requiring . Summarizing, we proposed a graph-based approach for sequence-pair-to-packing computation which has (approximate) average computational complexity
A node
(6.21)
where is the number of modules to be placed. The worst-case computational complexity of the approach is , but this can only occur in rare cases. The proposed algorithm is a signicant improvement over the original (worst-case and average-case) algorithm by Murata et al. [48].
81
3 1 5 2 6 4 6 1 3 1 2 5 3 1 4 6 1 3 3 3 5 1 2 4
Sequence pair: ((3,5,1,6),(1,3,5,6)) LCS((3,5,1,6),(1,3,5,6))=(3,5,6) Equivalent single-subsequence (permutation) representation: (3,1,2,4) all weights are 1 Maximum-weight increasing subsequence of permutation (3,1,2,4) is (1,2,4).
Figure 6.16: The relationship between a sequence pair and a single sequence representation is shown. Moreover, it is clear from this simple example that an increasing (decreasing) subsequence in the single sequence representation corresponds uniquely to a horizontal (vertical) path in the constraint graph. Surprisingly enough, the non-graph-based approach allows for a more efcient computation of a packing than the previously proposed graph-based method. The reason for this is that, given xed known element weights, not all edges in the constraint graph are needed for proper longest paths computation. In other words, by exploiting a priori information on the actual node weights, some edges in the graph need not be generated.
maximum-weight sequence, is a sequence that has a maximum sum of the sequence element weights.
82
Placement
corresponds one-to-one to a longest path in the horizontal constraint graph. In [47] a very efcient weighted LCS algorithm is introduced, which is in fact the same algorithm as in [83] but with a more efcient priority queue. Since the sequence elements are taken from a nite set , the Van Emde Boas data structure can be applied successfully here. The MWCS algorithm is given in Figure 6.17. For a detailed explanation of this algorithm we refer to the original paper [83]. From the amortized analysis given in [83] it is clear that
Input: sequence pair and element weights Output: maximum-weight common subsequence
1 2 for to do 3 4 5 predecessor 6 if null then else 7 8 insert 9 successor 10 while null do 11 if then delete else break 12 successor 13 od 14 od 15 return predecessor
Figure 6.17: The maximum-weight common subsequence (MWCS) algorithm. the following theorem must hold for the MWCS algorithm. Theorem 6 The asymptotic complexity of algorithm MWCS is , where is the amortized complexity of the priority queue operations: insert(), delete(), successor(), predecessor(). Proof Obviously the loop from line 2 to line 14 iterates times. Let us denote the computational complexity of each of the queue operations by , where is the rst character of the queue operation name. Then it follows directly that the worst-case computational complexity of all operations, excluding the while loop from line 10 to 13, is equal to . For the while loop, we can perform an amortized analysis which goes as follows. Since each element is inserted exactly once into , the total number of deletions is never more than . Only if an element is deleted, the successor operation on line 12 is executed. Therefore, the amortized computational complexity is . As a result, the overall worst-case computational complexity of algorithm MWCS is , which can also be expressed as . A direct consequence of Theorem 6 is that an asymptotic time complexity and space complexity implementation is possible of algorithm MWCS, using the Van Emde Boas data structure [58] which is featured by worst-case time complexity per queue operation. Note that the complexity values associated with the non-graphbased approach are worst case, as opposed to the average-case complexity of
83
The set
of the elements in , and a weight function . of all monotone increasing or decreasing subsequences of , with . over .
Since there are no fundamental differences between an increasing instance and a decreasing instance of the MWMS problem, we will simply call it the MWMS problem. Although a clear link has been established between computation of a packing and the maximum-weight monotone subsequence problem [85], incomplete links have been established between the maximum-weight monotone subsequence problem and related works in mathematics and computer science. The elegance and conceptual simplicity of the MWMS problem almost dictates that this problem is known and has been tackled before. Indeed, M kinen [86] surveyed the up-sequence problem and commented on the relationship bea tween the MWMS problem and the maximum-weight clique problem in permutation graphs. It turns out that the maximum-weight clique problem in permutation graphs is equivalent to the MWMS problem. The former had been investigated by Chang and Wang [87] well before the introduction of the sequence pair representation. They proposed efcient algorithms for both the maximum-weight clique and maximum-weight independent set problems on permutation graphs with complexity . Thus, in principle, the rst and fastest known algorithm in terms of computational complexity for non-graph-based placement computation using the sequence pair representation, was left undiscovered just until this moment. For completeness, we will mention the approach taken in [87]. First, we dene a clique. Denition 19 (Clique) A clique in a graph is a complete subgraph of
In Figure 6.18, the set of nodes forms a clique since every node in the set is connected to every other node in the set. In order to nd a maximum-weight (sum of all clique element weights) clique in a permutation graph, Chang and Wang observe that an isomorphic interval graph can be constructed in linear time from a permutation, which is a compact equivalent representation of a permutation graph. The obtained interval graph is
84
Placement
then used to nd a maximum-weight set of weighted intervals with a known algorithm due to Hsu [88] with complexity , where is the number of intervals (and also the size of the permutation). Effectively, a maximum-weight decreasing8 subsequence is obtained in . For didactical purposes, let us consider again the sequence pair
visualized in Figure 6.5. Using Corollary 1, we map this representation to a single sequence representation with the mapping dened by
which turns into a strictly increasing sequence. If this mapping is applied to , we arrive at the single-sequence representation (or permutation)
(6.22)
A permutation graph associated with a permutation is dened by is the set of nodes and is the set of edges dened by
, where
(6.23)
with . In words this means that an edge exists between two nodes and if and only if is larger than and is located before in the permutation , or is smaller than and is located after in permutation . Obviously, a one-to-one relationship exists between the permutation graph and the permutation. It can be veried, using (6.23), that a clique in a permutation graph corresponds uniquely to a strictly decreasing subsequence within the associated permutation. For example, (3,2,0) is a decreasing subsequence of , and , , and . Thus, set forms a clique, as expected. Also, (3,7) is not a decreasing subsequence. Consequently, there should not be an edge between 3 and 7 in the permutation graph . This is indeed . This notion can be easily generalized to the situation the case, as of weighted nodes. In that case, there is a corresponding maximum-weight clique with a maximum-weight decreasing subsequence, and vice-versa.9 The technique proposed in [87] is to map the permutation graph (with permutation given) to a so-called isomorphic interval graph representation for which Hsu [88] presented an algorithm to compute maximum-weight cliques. The crucial point here is the computational complexity of the isomorphic transformation. It is proven in [87] that this transformation can be performed in linear time. The formal transformation is discussed after illustrating the above ideas with an example. For ease of understanding, we use unweighted nodes in the following example. Furthermore, since we want to discuss the horizontal subcase (equivalent to increasing subsequences)
approach for nding a maximum-weight increasing subsequence is similar. we want to apply the ideas to strictly increasing subsequences, we can simply reverse the permutation sequence. Another interesting equivalent problem in connection with sequence reversing is left out of this discussion. The interested reader is referred to [87].
9 If 8 The
85
of packing computation and the algorithm given in [87] works by default on decreasing subsequences, we reverse the sequence of (6.22) and get
(6.24)
Figure 6.18(a) shows the permutation graph for permutation . Figure 6.18(b) shows the associated interval graph representation for this permutation graph. The construction of this graph is discussed shortly. The reader can easily verify that each pair of partially overlapping interval segments, say and , in Figure 6.18(b), with each segment associated with an element in , corresponds uniquely to an edge in Figure 6.18(a). Note that we deliberately put the nodes in the graph of Figure 6.18(a) in the same positions as given by the original sequence pair. Comparing this graph with the horizontal constraint graph of Figure 6.13 should directly reveil similarities. It is important to note at this point, that a clique in the permutation graph of Figure 6.18(a) corresponds uniquely to a horizontal path in the constraint graph of Figure 6.13. As discussed before, an increasing subsequence corresponds uniquely to a clique. As a consequence, these notions are fully equivalent. Formally, a given permutation graph, with permutation given, is mapped to an interval graph representation as follows. Each interval is dened as , . Add a super-interval , which is required for Hsus algorithm. If and only if two intervals have partial overlap, i.e. either or , then an edge exists between nodes and in the permutation graph. With the constructed interval graph, the algorithm proposed by Hsu [88] can be used to compute a maximumweight clique in the interval graph in time and space complexity, essentially similar to the approach and results of Tang et al. which was published many years later. However, it must be noted that the algorithm of Tang et al. is conceptually easier to understand. It is posed as an open problem whether or not the MWMS problem can be solved in linear time within our standard model of computation. However, we can derive that (in theory) it is possible to solve the MWCS problem in smaller complexity than which is obtainable through the use of existing practical data structures. In [89] optimal bounds on the predecessor problem are established. The theoretical result of that paper is a new data structure which stores integers from a universe of size in space and performs predecessor queries in
time. In conjunction with Theorem 6, we may conclude that the computational complexity of algorithm MWCS can be improved to
Since a solution of the MWCS problem is also a solution to the MWMS problem, the same achievable computational complexity holds for the latter. Summarizing, we can say that
The non-graph-based placement computation approach is computationally more efcient than a graph-based approach. The former can be practically implemented with complexity, while the latter needs .
86
Placement
0 2
1 5 4
7 9 8
-10
-9
-8
-7
10
-6
-5
-4
-3
-2 -11
-1
3 4
permutation
graph
corresponding
to
permutation
Owing to the fact that some redundancy is incorporated in the graph-based placement computation approach, it can be more easily generalized to an incremental approach. We do not make any claim that this is impossible with the non-graph-based approach. However, it is surely much more difcult. Another advantage is that the division between relative and absolute computation of the graph-based approach yields better (visual) insight into the problem. Consequently, analysis and design of relevant algorithms is substantially facilitated.
From both a theoretical and a practical point of view, it is more interesting to investigate an incremental generalization of the graph-based placement computation approach. From
87
theorical analyses, we might nd interesting and exploitable properties that were previously unknown. Also, fundamental links with other approaches might be indirectly established. From practice, we gain important experience on how the incremental approach relates to the non-incremental approach in terms of run-time performance. From this information, practical guide lines can be derived for the usage of the incremental algorithms.
-swap: the topology of the constraint graph generally changes, and also the longest
paths information must be updated;
swap: the topology of the constraint graph is unaffected, but the longest paths graph must be updated; rotate (over ): there is no change in relative relationships, and the longest paths graph must be considered for an update only if the rotation angle is or ;10 mirror (horizontally or vertically): the constraint and longest paths graphs are unaffected.
Typically, only a small part of a placement is actually affected by a perturbation. This fact can be exploited by an incremental computation approach. Furthermore, as will be clear shortly, the incremental approach is exact, i.e. no error in the placement is introduced. As a consequence, the quality of a placement obtained by incremental techniques is essentially the same as one that is obtained by compute-from-scratch techniques. It is convenient to specify exactly what is meant by an affected module and a moved module in this context.
10 Note that rotation over does not have any inuence on the placement, but that it does inuence the routing because pin positions are changed.
88
Placement
Denition 20 (Affected module) A module is affected if it is an operand during a perturbation, or if its location can be inuenced by that perturbation. Denition 21 (Moved module) A module is called a moved module if its location has actually changed due to a perturbation between consecutive iterations. If a module changes orientation or is moved, generally all nets connected to that module have to be re-routed. As a result, in all of the above perturbation cases, the routing has to be recomputed in some way. Generally, we can state that the more modules are affected, the more routing effort is required. We split the incremental packing computation in two parts. The rst step computes the modied constraint graph in an incremental way. The second step computes the longest paths information in an incremental way.
Figure 6.19: The direction of rotation associated with an -swap depends on the relative positions of modules and . The nodes are drawn in an oblique grid view.
Figure 6.20: The direction of rotation associated with an -swap depends on the relative positions of modules and . The nodes are drawn in an oblique grid view. the relative orientation of the nodes under consideration. Let us call these two nodes and
89
, and let and divide the grid in nine regions denoted by The gridlines belonging to nodes and are not part of these regions.
. Furthermore, for ease of discussion, we will rotate the oblique grid 45 degrees so that we get a grid with horizontal and vertical lines, in which sequence is vertically aligned with the left side (top down) and sequence is horizontally aligned with the bottom side (left to right). Figure 6.21 visualizes this idea. The nodes in these regions are elements of nine disjunct sets denoted by
Figure 6.21: Two nodes and divide the grid in nine regions. . From Figure 6.19 and Figure 6.20 it is clear that there are 4 possible scenarios for computing the new sets after a perturbation operation. In order to construct a fast and efcient algorithm we assume that all elements of an set are stored in order. The order is dened by the position of the element in sequence . This could, for instance, be achieved using an implementation based on balanced binary search trees, such as splay trees [54]. We discuss an example to illustrate the way in which sets are updated. Suppose we have the situation as shown in Figure 6.22. The situation before perturbation -swap is
Figure 6.22: Perturbation -swap is performed on nodes and . The left side is the situation before the perturbation at time . The right side is the situation after the perturbation at time . denoted by , and the situation after -swap is denoted by . The two nodes and are vertically oriented (in the oblique grid) at time and after the -swap at time they are horizontally oriented (in the oblique grid). Assume we want to update the sets and , then the following two cases can occur: 1. 2.
and
, where the algorithm in Figure 6.25 can be applied, , where the algorithm in Figure 6.26 can be applied.
90
Placement
First an example is discussed to illustrate the approach, which is subsequently formalized into an algorithm. Figure 6.23 depicts an illustrative example in which the computation of the set is shown. The time order in which the nodes are processed is indicated by a
1 3 4 5 6 2 1
2 3 4 5 6
Figure 6.23: An example to illustrate the manner in which is constructed according to the algorithm shown in Figure 6.25. number; the node with number 1 is processed rst, and the node with number 6 is processed last. Furthermore, the dotted arrows visualize that a node is directly viewed by another node, and thus is an element of that nodes set. At the left-hand side of Figure 6.23, the situation before the -swap of nodes and is depicted. We start from this scenario to nd the nodes that will be in set after the perturbation, in other words the set. The search for the nodes in is initiated by looking for the rightmost node in region which is not directly viewed by another node in region . Once found, the nodes that are directly viewable by node at time are added from right to left to (which is initially empty). In Figure 6.23, the sequentially added nodes are numbered 2, 4 and 5. These steps deserve some additional explanation. Searching for the rightmost node in region which is not directly viewed by another node in region is easily accomplished as follows. Start looking for the rightmost node in . Assume, without loss of generality, that this node (node 1 in Figure 6.23) lies in region . Clearly, none of the nodes in region can be an element of . Therefore, if the rightmost node that has been found is not in region , iteratively look for the rightmost node in the set of the last found rightmost node , until the newly found node is in region . Suppose this node is (node 2 in Figure 6.23). Search for the node just left of in . Suppose this node is . At this point, we distinct two cases: and . In the latter case, we simpy add to . So assume (node 3 in Figure 6.23). In this case, we set and proceed looking for the node in just left of the last found node in region (node 4 in Figure 6.23). This process is iterated (node 5 in Figure 6.23)) until the node (node 6 in Figure 6.23) does not contain an element in its set which is left of the last added node (node 5). This completes the computation of . Let us proceed with an example in which we want to compute . Assume without loss of generality, that region is non-empty. Figure 6.24 depicts an example scenario. The determination of is performed in two phases. First, all directly viewable nodes in region and are determined. Second, all directly viewable nodes in region are determined. Note that if and only if region is empty, node is an element of . At the
91
left-hand side of Figure 6.24, we see that the nodes in regions and which are to be added to , are searched for in the view direction. We start with node and search for the leftmost node right of in (node 1 in Figure 6.24). This node is called the reference node. Now we proceed by using the last found node, say , to nd the leftmost node right of node in (node 2 in Figure 6.24). This step is iterated (node 3 in Figure 6.24) until no such node can be found. The remainder of the elements in
1 2 4 5
4 1 2 5
Figure 6.24: An example to illustrate the manner in which is constructed according to the algorithm shown in Figure 6.25. is located in region . Finding those nodes is straightforward. An strict requirement is that all those nodes should lie above the reference node (node 1 in Figure 6.24). First nd the rightmost node, say , in (node 4 in Figure 6.24). Now nd the rightmost node in which is left of the last found node (node 5 in Figure 6.24). Repeat this step until the previously given requirement is violated or no further nodes can be found. Formally, the algorithm that computes the new sets, for this specic orientation and perturbation sceneario is shown in Figure 6.25. Function nd max searches for the element in with largest position index in sequence . The execution of nd max on line 2 nds the rightmost node in region and (if it exists). If no such element exists, then . In the rst while loop from line 4 to 7, the algorithm efciently searches for the rightmost element in region . If no such element exists then . The second while loop from line 8 to line 19 determines all elements in . If a node is found in region using function nd max on a previously determined reference node , then this node must be in which is accomplished on line 11. Once the rightmost node has been found, the rightmost node which is left of a previously added node to and directly viewable by the reference node , is searched for. Computation of the is performed on lines 20 through 33. On line 22 the algorithm searches for the leftmost node which is right of and in . This implies that this node must be located in region of . Moreover, this node must be an element of which is established on line 25. Figure 6.24 illustrates the search for the nodes in region and which must be included in . However, the construction of is not complete yet as region might contain more nodes that should be in . The determination of these nodes is accomplished by the while loop from line 30 to 33. Node which is dened on line 23, acts as a reference
92 node. All nodes in region must lie above this node and an element of requirement to be part of .
Input: and Output: updated
1 2 3 4 5 6 7 8 9 10
Placement
, which is a
while do nd max
od while do if then
11 12 13 14 15 16 17 18 19 od
nd predecessor elsif
then
nd predecessor
31 32 33 od
do nd predecessor
Figure 6.25: Incremental update algorithm for updating the sets, . Before perturbation, nodes and are vertically oriented (see Figure 6.22). Construction of the and sets starting from an initially horizontal orientation of and goes according to the algorithm shown in Figure 6.26. With the explanation of the previous algorithm, it is easy to understand the actions of the algorithm in Figure 6.26. Note that for notational convenience a dummy node is introduced, with . Analogously, the other sets can be computed by properly adapting the previously discussed algorithms to the specic situations. It is also possible to use the symmetry relationship given by (6.9) and (6.10) to compute the sets for the other viewing directions. Due to the perturbation of nodes and , nodes in the regions , , , and might
93
12 13 14 od
nd predecessor
35 36 37 38 od
nd predecessor
Figure 6.26: Incremental update algorithm for updating the sets, . Before perturbation, nodes and are horizontally oriented. have been affected too, in terms of their sets. Therefore, we have to trace down which of these nodes need to update their sets. Fortunately, by virtue of the symmetry relationships (6.9) and (6.10), these nodes can be determined easily. For the discussed scenario, the set is at most
(6.25)
94
Placement
(6.26)
where set difference is dened by . This is demonstrated by the scenario shown in Figure 6.27. Clearly, updating only the nodes in the set difference would
Figure 6.27: An example showing the insufciency of using set difference of nodes as given by (6.26). At the left side and . At the right side and . ignore node , due to the fact that node prevents node from being in , and node is not in . It is clear that denitely needs updating because node should not be in . Therefore, it is clear that using (6.26) is not adequate. Obviously, the use of (6.25) is sufcient. However, it may be possible to exploit a more clever technique based on the symmetry properties of the sets given by (6.9) and (6.10) which minimally extends the set (6.26) to render it sufcient. This can be intuitively understood by the observation which we derive from the following theorem. Theorem 7 If a non-empty rectangle induced by unperturbed nodes and becomes empty, or vice versa, due to a perturbation, both the set and the require updating. The view direction is from node to node and the opposite view direction is from node to node . Proof By denition, if and only if the rectangle induced by and is empty. Therefore, if a non-empty rectangle becomes empty, or vice, versa, we have a change and consequently the and sets must be updated. A direct consequence of Theorem 7 is the following. Corollary 2 If a previously empty rectangle becomes non-empty or a non-empty rectangle becomes empty, and is the moving node, then and must be updated. The directions and are dened in Theorem 7. We can use Corollary 2 as an aid to identify pairs of nodes such as dened by Theorem 7. In other words, if a node requires an update of its set, we determine node and update its set. Finding can be accomplished with .
95
The complete set of unperturbed (static) nodes for which we have to update the and sets must be located in regions , , , . For ease of discussion, let us assume that we have a node in located in region , as shown in Figure 6.28. Without loss of generality, assume nodes and are in region and , respectively, and both nodes are viewed by . After the perturbation, the shaded are must be explored in order to nd nodes which should be included in . With the aid of previously discussed techniques it is quite easy to identify those nodes efciently. The steps to be taken are formalized in the
Figure 6.28: Node in region will be affected by the perturbation of nodes and . The shaded area induced by nodes and may contain nodes which will be viewed by after the perturbation. algorithm shown in Figure 6.29, which efciently computes the set. On line 2, we search for node . If it exists, variable records its position in sequence . If not, the shaded region extends to the lower boundary of the grid and is assigned a very large number. Line 4 determines node . If a node is found in region then, on line 7, we look for a module in s view at time which is left of . If the node is located in region or does not exist then node will be viewed by after the perturbation. In this case, lines 9 and 10 are executed,
Input: and Output: updated
10 11 12 while 13 14 15 od
nd predecessor
do
Figure 6.29: An algorithm to compute ; essentially determining any nodes that will be viewed in the shaded region of Figure 6.28.
96
Placement
nding a node in the shaded region which extends to the right of region and in s view at time . Finally, in the loop from line 12 to line 15, we repeatedly add nodes to from s view, if they are above node . This is done in a right to left fashion using the predecessor approach. It is somewhat elaborate but quite straightforward to adapt the algorithm in Figure 6.29 to the other cases.
for a randomly picked (with each equiprobable). Therefore, the search will take . If (and only if) , the function returns . The loop from line 4 to line 7 will be iterated at most times due to (6.27). Moreover, the member check on line 4 can be performed in constant time. Consequently the rst while loop has complexity . The second while loop from line 8 to 19 is also iterated at most times due to (6.27). At most elements are added on line 11. Adding elements to an (initially empty) set takes , because
where is some constant. Since nd predecessor () on lines 13 and 17 within this loop takes , the total computational complexity for constructing the updated set is .
(6.28)
Similar arguments hold for the computational complexity of the while loops from lines 24 to 28 and from 30 to 33; both run in . The resulting total computational complexity of the incremental update algorithm to compute the updated sets, , is . The same line of reasoning can be applied to the algorithm of Figure 6.26, which computes the sets, , but where the nodes and are horizontally oriented before perturbation.
97
To complete the analysis, the algorithm shown in Figure 6.29 should be included. This algorithm is the basis of the overall approach to compute the updated sets for all . On line 1 the old set is copied to the new set. This takes using (6.28). Operation nd predecessor () and nd successor () on line 2 and 4, respectively, use .The same complexity applies to the operations on lines 7, 9 and 10. The dominant part of the algorithm is the while loop from line 12 to 15. By (6.27), the loop iterates times. Within each iteration we have to add an element to and run nd predecessor (). Thus, by similar arguments as before, the result sums up to total computational complexity.
Since there are nodes for which the aforementioned complexity is required, an upper bound on the average computational complexity is given by
(6.29)
(6.30)
Hence, the resulting grand total for the complete incremental constraint graph computation approach is (6.31) Clearly, an absolute lower bound on the average computational complexity is
(6.32)
Note that both complexities in (6.31) and (6.32) cannot be reduced by using the more efcient Van Emde Boas data structure [58]. The reason for this is that the universe of elements . In theory, the aforementioned complexities can within a single set has size be improved but from a pragmatic point of view this is at least impractical since no implementations of theoretically more efcient data structures have been reported as yet.
98
Placement
It can be veried that the following property holds for sequence . Going from left to right through sequence , each sequence item that is smaller than all of its predecessors is a start node. The rst sequence item is a start node by denition. It is easy to map this sequence element in back to the original module number using the inverse mapping . For example, when we take the sequence pair shown in Figure 6.6, the mapping is given by
and the inverse mapping is obtained by reversing the direction of the arrows. When is applied to sequence of the example, the following sequence is obtained for :
The start nodes of this sequence are , which can be found in linear time. Using the inverse mapping it is straightforward to nd that the original start node numbers which can be veried with Figure 6.6. It is clear that the aforementioned are approach also works for sub-sequences, i.e. a permutation of a subset of . This is also the benet of this procedure, since the complexity is linear in the size of sequence 11 . Note does not hold for start nodes of a sub-sequence of . that in general, The new incremental algorithm which computes the longest paths through constraint graph after a perturbation is discussed hereafter. Essentially, the longest-paths forest is made inconsistent after perturbing and the purpose of the incremental algorithm is to recompute (partial) longest paths in order to make consistent again. We dene four types of inconsistencies: 1. under-consistent; applies when the distance value of a node is lower than its consistent value given by (6.20), 2. over-consistent; applies when the distance value of a node is higher than its consistent value given by (6.20), 3. LP-underconsistent; applies when and and is distance-consistent, 4. LP-overconsistent; applies when and
, where .
We also refer to the rst two inconsistencies as distance-inconsistencies, while the last two inconsistencies are also called LP-inconsistencies. Graph is called distance-consistent when all distance values comply to (6.20). A graph is called consistent when it is both distance-consistent and LP-consistent. The incremental longest-paths (ILP) algorithm is given in Figure 6.30 and operates as follows. On lines 1-3 the distance elds of all affected nodes are set to zero so as to force correct computation of their new distance due to (new) incoming edges. The outer loop starting on line 5 and ending on line 29 checks if all candidate nodes given by set have been processed. Each processed candidate node is
11 With current sorting algorithms, the worst-case complexity is increased to when .
and only
99
eligible for annotation as a moved module. Consequently, the number of moved modules is at most equal to the total number of candidate nodes. On line 6 the start nodes are found for set using the single sequence approach described earlier. Note that line 6 is executed at least once (with ), and possibly thereafter when the priority queue is empty and . This occurs when the start node(s) propagate(s) changes through the longest paths forest, but not all affected nodes are processed during this update. The inner loop starting at line 7, processes all (distance) inconsistent nodes that are encountered during a single propagation wave. By virtue of the absence of cycles in (and ), an edge is processed at most once. Furthermore, extracting the smallest distance node from on line 8, guarantees that all candidate nodes are made consistent exactly once. The latter is performed on lines 9 through 14. For each node that is made consistent, all its outgoing edges are processed and the corresponding nodes are checked for inconsistency on lines 15 through 23. Each outgoing node of is checked for under-consistency on line 17, and checked for over-consistency on line 18. Note that over-consistency can only occur if of the inconsistent input graph. An inconsistent node will have its distance updated on line 20 and it will be put (back) on the heap with its new distance value for further processing on line 21. Line 22 will tag an inconsistent node so that it will be made consistent in the iteration where it is extracted from the priority queue . Lines 23 through 26 cover the case in which a node is LP-inconsistent. In these cases is added to to re-establish consistency. Below are some brief descriptions of the functions that are used in the incremental longest paths algorithm.
adjust heap inserts in heap if is not an element of the key space of , otherwise it adjusts the value eld associated with if it is smaller than . extract min removes the pair with minimal value eld from the heap and returns the key eld of that pair. recompute dist computes the longest distance from all predecessors of to . update lp pred updates the longest path information from the predecessors of to . insert lp pred
nd start nodes nds all start nodes from set nique as described previously.
100
Placement
foreach od while
do
do
then
adjust heap
elsif
then
insert lp pred
od od od
Figure 6.30: The incremental longest paths (ILP) algorithm. is bounded by , where is an adaptive parameter that captures the set of vertices with a changed input or output value. Moreover, is the number of vertices of which the input or output value changes, and is equal to plus the number of edges incident on some node in . Because the single-sink-shortest-paths problem is similar to the single-source-longest-paths problem, the algorithm is suitable for incremental computation of longest paths in the constraint graphs. Fortunately, the constraint graphs under consideration are directed acyclic graphs (DAGs) and we know that for this subclass of graphs, the . As a consequence of this property, longest path algorithm runs in we are able to use algorithms that have incremental computational complexity . A practical question remains on the parameter : how does it relate to the problem size? In general, is an unknown parameter; it can only be quantied after the actual computation. However, we know . In the specic case of a constraint graph induced by a sequence pair, we are able to say some things about in a quantitative way, under certain presumptions. Note that the impact of a random perturbation on is of a global nature, as opposed to the impact on the sets which is of a local nature. The underlying reason for this fact is that essentially embodies both relative and absolute information changes, whereas the sets only represent relative information. Analyzing the average in terms of a random
101
perturbation is most convenient from a mathematical point of view. The validity of this approach is demonstrated in Section 6.9. Note that we have a bijective mapping between constraint graphs and sequence pairs. As a consequence we can write every property of the constraint graph as a property of the corresponding sequence pair. Furthermore, the sequence pair properties can be analyzed in a simplied way using a single permutation (Corollary 1). We want to quantify the average size of which is the expected number of nodes in a longest paths subtree of the reduced constraint graph . We know that the constraint graphs have the property of being node-weighted, i.e. the outgoing edges of a node all have the same weight determined by the corresponding node. However, to simplify analysis we assume that on average the node weights do not determine the (average) topology of the longest paths subtrees, but the depth values (determined by a depth rst search from the source node(s)) of the vertices do. This statement deserves some additional explanation. A graphical representation of an example packing of 10 modules, its horizontal constraint graph and its associated longest paths subtree is shown in Figure 6.31. Intuitively, the previous assumption is quite reasonable, as the weight of a node does not change the topology of the constraint graph ; only might be affected. On the other hand, a change in depth value of a node (caused by a perturbation) does affect the topology of and therefore is always affected. Figure 6.32 shows the impact of a change in weight of a node. In this example, node 6 (module 6) has its weight (width) increased from 58 to 78. We see directly from Figure 6.32(b) that the topology of the constraint graph does not change compared to Figure 6.31(b). However, note the change in the longest paths subtree ; the longest path edge (3,8) is removed and edge (6,8) has become part of a longest path. This is also clearly visible from the packing in Figure 6.32(a) where modules 8 and 1 are moved to the right to allow module 6 to expand in width. Note that the width of the chip area increases from 199 to 206, as indicated by the distance value of dummy node . Figure 6.33 shows the packing and the associated constraint and longest paths graph after performing an -swap of nodes 4 and 7 on the state represented by Figure 6.31. We see directly from Figure 6.33(a) that the packing is quite substantially affected. Moreover, Figure 6.33(b) shows that the constraint graph and longest paths subtree are both affected. Dene , the size of a subtree in , as the sum of the nodes reachable from node plus one. Then the average subtree size is dened by
(6.33) Each node in a subtree contributes to the total , the number of times it occurs in any subtree. Let us call the total number of occurrences of a node in any subtree, the multiplicity of that node. Thus we can write (6.33) as (6.34) The multiplicity of a node is exactly the number of ancestor nodes of plus one. We
assume without loss of generality that each node has at most one parent. If a node has more than one parent in , this means that there is more than one longest path to this node. If
102
Placement
0 / 36 2 3
36 / 91
3
4
0 / 79 8
4
0 / 56
56 / 58 6
8 5 7 0 9 6
56 / 35 7 9 0 / 13 0 91 / 28
(a)
(b)
Figure 6.31: (a) A packing of 10 modules and (b) the associated horizontal constraint graphs and longest paths subtree. The modules are drawn as nodes and the arrows denote relative relationships. Moreover, the solid arrows dene the edges in the longest paths subtree . Each node has two associated integers; the value left of the slash symbol (/) denotes the distance of the bottom-left -coordinate of the module relative to the reference point 0, and the value right of the slash symbol denotes the weight (width) of the associated module. this situation occurs frequently on average, the diversity of the module dimensions must be low. In a practical problem instance this is highly unlikely, and thus the probability that a node has more than one parent is negligible.12 So the expected multiplicity of a node is also equivalent to the expected length of a maximal common subsequence of sequence pair . The expectation should be taken for a given element over all possible congurations for a (typical) xed topology of . The latter is done for simplicity but without loss of generality. Note that the average is taken over all possible set elements for a xed topology. Thus, we have
(6.35)
From Theorem 3 we know that a maximal common subsequence is equivalent to a maximal increasing subsequence which is denotes by (see Denition 17). As a consequence, we have (6.36)
12 This
implies that
, on average.
103
0 / 36 2 3
36 / 91
3
4
0 / 79 8
4
0 / 56
56 / 78 6
8 5 7 0 9 6
56 / 35 7 9 0 / 13 0 91 / 28
(a)
(b)
Figure 6.32: (a) The packing of Figure 6.31 after increasing the width of module 6, and (b) the associated (unchanged) horizontal constraint graphs and (changed) longest paths subtree. Thus, given a random sequence pair (with an associated constraint graph), the following can be derived. Using (6.36) we have
this results in
Applying the Euler-Maclaurin summation formula to this nite sum, the approximation
we have
So the expected size of a subtree of is . As a consequence, on average affected nodes affected edges .
(6.37)
104
Placement
0 / 36 2 3
36 / 91
2 7
3
8 0 / 35 56 / 58 6 0 / 56 5 4
8 5 4
114 / 79
6 9
56 / 28 9
0 / 13 0
(a)
(b)
Figure 6.33: (a) The packing of Figure 6.31 after an -swap of nodes 4 and 7, and (b) the associated (changed) horizontal constraint graphs and (changed) longest paths subtree. However, this is the average computational complexity which is needed in the worst case to update after deleting or inserting an edge or changing an edge length, hereafter collectively denoted by edge change [91]. Intuitively it is plausible that efcient incremental algorithms for single-source longest paths in general graphs, such as described in [90, 91, 92], are not most efcient when applied to our restricted problem instances. It turns out that a more efcient approach can be used on the constraint graphs, by exploiting the knowledge that there is a lot of correlation between the longest paths induced by edge changes. Under the assumption that each module is equally probable to be affected, the expected size of a longest-paths subtree rooted at a randomly chosen node is directly obtained by taking the expectation of both sides of (6.34) and combining with (6.37):
Finally, applying (6.35) we obtain
This implies that the expected amount of change in is per affected node. This can be seen as a lower bound for the expected amount of work to be performed for a single affected node. Let us now analyze the incremental longest-paths (ILP) algorithm shown in Figure 6.30. steps. The set of affected nodes is assigned to , The initialization on lines 1-3 takes
(6.38)
105
the set of candidate modied nodes, on line 4. The actions on lines 11-13 are only performed for the candidate nodes, for which holds. Taking , yields a too optimistic estimation. Under the reasonable assumption that a longest-paths subtree rooted at any one of the start nodes derived from set , is highly likely to contain other affected nodes, the expected total number of iterations of the while loop line 7 can be at is a better approximated by as a consequence of (6.38). Thus estimation. This also implies that the expected number of times that the while loop at line 5 is executed is , since candidate (and affected) nodes are subtracted from as they are encountered during the recomputation of the longest-paths subtrees. In other words, each time the algorithm executes line 5, is smaller than the previous time. Each invocation of and line 15 explores nodes. Function adjust heap operates within is at most called times as a consequence of the assumption. Furthermore, the average computational complexity of function nd start nodes implemented with splay trees is at . All other operations have complexity. most Summing up these results, an approximation of the average computational complexity of incremental longest-paths algorithm is
lines 1-3
line 5
line 7 line 8 lines 9-14 line 16
line 6
line 18 line 21 line 25
big-O calculus
(6.39)
Note that all and operations are conditional. This completes the analysis. It can be concluded that under reasonable assumptions, the ILP algorithm has near-optimal computational complexity.
106
Placement
be updated using the incremental longest-paths algorithm with the initial affected set equal to . It is clear that the -swap induces most work for the incremental update algorithms. Therefore, the associated computational complexity is an upper bound on the average-case complexity over all perturbations.
(6.40)
The above denition of slack space implies that a 0% slack space packing is an optimal packing. A 50% slack space packing contains the same amount of empty and non-empty space. Larger slack space values are associated with progressively worse packings. Note that an optimal packing is not always a 0% slack space packing. Summarizing, we conduct various experiments to establish experimental evidence of
the correctness of the theoretical analyses for a single random iteration, the practical computational complexity under the assumption of equiprobable sequencepair selection, the validity of the theoretical assumptions in a practical SA optimization environment which employs a large sequence of iterations,
107
the efciency of an incremental approach over a conventional approach in a practical simulated annealing environment.
12
x 10
1.8
11
1.6
10
1.4
9
1.2
0.8
7
0.6
6
0.4
0.2
5
0 50 100 150 200 250 300
50
100
150
200
250
300
# modules
# modules
(a)
1.95 x 10
5
(b)
6.5
x 10
1.9
1.85
5.5
1.8
5
1.75
4.5
1.7
4
1.65
1.6
3.5
1.55
50
100
150
200
250
300
50
100
150
200
250
300
# modules
# modules
(c)
(d)
Figure 6.34: (a) Average CPU time of a placement computation as a function of the number of modules and (b)-(d) manipulated curves. curve is claried by three additional plots shown in Figure 6.34(b)-(d). They are obtained from the original curve by
respectively.
and
108
Placement
If, indeed, then it is expected that is equal to . From Figure 6.34(c) we can see that is quite noisy with a very small positive trend, implying that the original curve follows (6.39) quite well. This is conrmed by Figure 6.34(d), showing a decreasing trend towards zero. Hence, we may conclude that the average computational complexity of the implemented algorithms is near optimal.
A plot of the slack space of the optimized packings for three independent runs is shown in Figure 6.35. It is clear from this gure that there is quite some variation among the results of different runs of the same problem instance. Obviously, our implementation of the SA optimization algorithm has problems getting out of local optima when the number of modules is small. This could be explained intuitively by the fact that with a small number of modules, a perturbation easily leads to a relatively large change in the cost value. The net result is a very irregular cost landscape and, therefore, worse convergence properties of the optimization algorithm. Furthermore, we can see from the gure that the amount of slack space increases signicantly as a function of the problem instance size. This phenomenon can be explained by the relatively simple perturbation scheme that is used in our optimization framework. As we only consider relative perturbations with no knowledge of the absolute positions of the modules, many generated moves will affect modules which are spatially far apart. In the current approach there is no way of choosing modules which are relatively close together, even when the optimization is in its nal phase. In other words, we cannot force the SA algorithm to sample the solution space more smoothly when optimization proceeds. Clearly, this unwanted effect will be increasingly more pronounced with increasing . We may conclude that the sampling behavior of our SA optimization scheme will become increasingly
109
12
10
50
100
150
200
250
300
350
Figure 6.35: A plot of the slack space of several packings. more inefcient for problem instances containing more than roughly 50 modules. Note that no tuning was involved for obtaining these results. Figure 6.36 shows the CPU time for a single packing optimization run for a range of problem instances. The plot indicates a super-linear growing trend in CPU time for computing packings as a function of the number of modules in a packing. A closer inspection reveals that the trend is quadratic. One might wonder if there is a direct relationship between the computational complexity of a complete optimization run and the complexity of a single iteration. As will be clear shortly, indeed, a close correlation between these two complexities can be observed.
CPU time for packing optimization without routing considerations
2.5 x 10
4
1.5
0.5
50
100
150
200
250
300
350
Figure 6.36: CPU time of a complete optimization run as a function of problem instance size for three independent runs per problem instance.
110
Placement
We also verify the validity of equiprobable selection assumption of Theorem 5 by plotting the average longest-paths tree size of the nal optimization result, for a wide range of problem instances. The program is run three times for each problem instance size. The average subtree size as a function of the problem instance size is plotted in Figure 6.37. The plot shows a
Average subtree size as a function of , without routing considerations
14
12
10
200
250
300
350
Figure 6.37: The average subtree size in the longest-paths graph as a function of the problem instance size after packing optimization without routing considerations. clear sub-linear trend as a function of . Indeed, the trend is according to which is evident from the plot shown in Figure 6.38, which is the result of dividing the values plotted in Figure 6.37 by .
Exposed trend of average subtree size without routing considerations
0.7
0.65
0.6
0.35
0.3
200
250
300
350
111
Additionally, it is interesting to verify whether these results also hold under differerent circumstances. For instance, when we change the cost function (4.3) to include routing issues. For the moment, we only mention that a sophisticated (global) routing scheme, denoted by SPBH I, is used which will be discussed in detail in Chapter 7. The cost function weights are set to: , and . The obtained results indicate that the average subtree size grows according to a function which lies in between and . Finally, we show in Figure 6.39 the CPU time of incremental packing optimization versus non-incremental packing optimization for a range of randomly generated benchmarks and the largest MCNC benchmark ami49. It is clear from this plot that the incremental
CPU time incremental versus non-incremental packing optimization
10
5
nonincremental incremental
10
10
10
10 1 10
10
10
Figure 6.39: The CPU time of incremental and non-incremental packing optimization as a function of the problem instance size without routing considerations ( , ). placement computation approach outperforms the non-incremental placement computation approach starting from about . This means that the incremental approach is practically feasible and leads to increasingly larger improvements as grows larger.
6.9.3 Conclusions
The assumption of equiprobable sequence-pair selection for analyzing the average computational complexities in connection with incremental sequence-pair-to-packing computation is justied. With and without consideration of global routing the average subtree size of a longest-paths graph with nodes lies between and . The previous complexities hold both for a single sequence-pair-to-packing iteration as well as for an actual sequence of iterations within a practical SA optimization run. From the experimental results we can clearly observe that the performance of the optimization framework depends on the size of the problem instance at hand, and is arguably dependent on the quality of the generated perturbations (moves). It is likely that a more sophisticated perturbation scheme which generates (mostly) better moves, in terms of a higher probability of acceptance, will improve
112
Placement
overall performance of the SA optimization framework. Finally, we observed an approximate one-to-one correlation between the complexity of a single sequence-pair-to-packing iteration and the time required for the optimization process to arrive at a nal solution. Since the average computational complexity of a single incremental placement computation iteration is , which is better than the computational complexity of any from scratch placement computation algorithm (either one of , , and ), the overall run time of an incremental SA optimization run must be better for . Indeed, when we compare a small-constant-factor quadratic implementation all with our (unoptimized) incremental implementation, we nd .
When a placement is input via a graphical user-interface, one needs to translate it to a sequence pair representation before any automated concepts can be applied to the placement. An actual placement gives exact information on spatially close modules to a specic module, as opposed to the relative sequence pair representation or (equivalent) oblique grid representation. Therefore, the actual placement is better suited for use in connection with choosing distant or close modules which is useful for selecting perturbation types in a sophisticated implementation of simulated annealing. As will be demonstrated in Chapter 8, Section 8.7, efcient enumeration of all modules in a placement is very useful for the minimization of performance-degrading physical coupling phenomena. We argue that efcient enumeration is a key ingredient for placement-to-sequence-pair computation.
In [48] a method called gridding was introduced to map a packing to an equivalent sequence pair (SP) representation. However, the described approach is quite ambiguous, in the sense that the packing can be mapped to more than one SP. Furthermore, if modules are placed with cutting zones of slack space such as in Figure 6.40, it is impossible to apply Muratas gridding procedure. As argued in [94] it is possible to determine a sequence pair from a given packing by the following procedure. To determine sequence , push the modules out of the packing in an top-left order without having to move aside any other module. For sequence the same push-out procedure can be applied except that it should occur in a bottom-left order now. It is known that this procedure does not always yield a unique solution which can be easily seen from the example cases in Figure 6.41. We note a serious shortcoming in the previous push-out algorithm, being the fact that not all placements can be mapped to the original (or equivalent to the original) sequence pair [94], and, as a result, establish a new observation which is called the idempotent property of the placementto-sequence-pair mapping. Formally, if we denote the mapping from sequence pair to a packing by , and denote the mapping from a packing to a sequence pair by , then
113
4 8 5 7 0 9 6 1
1 2
packing
of
10
modules
corresponding .
to
SP
ambiguous
un-ambiguous 1
Figure 6.41: The mapping from packing to SP is not always unique and depends on the actual sizes of the modules. is said to establish an idempotent mapping, or is simply called idempotent if
(6.41)
Due to the fact that many sequence pairs can map to exactly the same packing (depending on the sizes of the modules), is not unique. In [94] an attempt is made to formalize this idea under the ambiguously dened notion of 1-dimensional compaction. We circumvent the non-uniqueness of in this discussion by stating a natural requirement for . In words, (6.41) means that when a packing is re-computed for a sequence pair which is obtained by applying mapping to the originally computed packing, then these two packings should be equal in every sense. The procedure for proposed in [94] does not guarantee an idempotent mapping as dened by (6.41). This can be easily seen from the packing in Figure 6.42 which is the result of that procedure applied to the packing of Figure 6.40, yielding sequence . The discrepancy is caused by the pair cutting zone of slack space which isolates a module or group of modules from its left or lower neighbor module while these modules could be shifted leftward or downward without incurring overlap13. At least two ways exist to resolve the problem.
Remove cutting zones of slack space by (virtually) enlarging the size of specic modules.
that the relative relationships dictated by the sequence pair are violated by doing so.
13 Note
114
Placement
4 8 5 7 0 9 6 1
Figure 6.42: The method proposed in [94] to compute a sequence pair from a packing does not yield an idempotent , which is shown clearly by this placement where the previous location of a shifted module is drawn in a darker shade.
Adapt the naive packing-to-sequence-pair push-out method so that it takes cutting zones of slack space into account.
First, we formalize the packing-to-sequence-pair (P2SP) method into an algorithm. With a slightly adapted version of the area enumeration operation of the corner stitching data structure (see Chapter 5), it is possible to realize the packing-to-sequence-pair algorithm. The aforementioned algorithm, shown in Figure 6.43, has computational complexity,
1. 2. Enumerate all left-side modules in the packing using corner stitches and push them into in order of occurrence from top to bottom. 3. Extract the topmost module from and push out . 4. Recursively push out the topmost right-side module, say , of .
5. If no more module can be found for pushing out and is non-empty, then go to step 3, else stop.
. If the possibility occurs to push out a right-side and a bottom-side module, the rightside module has priority over the bottom-side module. If a bottom-side module is pushed out then . . .
.
. .
Figure 6.43: Algorithm packing-to-sequence-pair (P2SP) which essentially enumerates the elements of sequence (and ) in an efcient manner and in a specic order. where is the number of modules in the packing. This can be easily understood as follows. Step 2 enumerates all modules at the left boundary of the chip area. In the worst case, this takes , and on average this is . During step 3, the modules are extracted in a rst-in-rst-out manner. Hence the total computational complexity of extracting all modules of takes in the worst case. The most time-consuming operation of step 4 is the traversal of all neighboring modules at the right side of a given module. As neighbor enumeration is performed once for every pushed-out module, and the computational complexity (with hint) for neighbor enumeration is , the amortized computational complexity of step 4 is when implemented efciently. As a result, the overall (average) computational complexity is . The determination of sequence can be performed in a similar
115
manner, except now the modules are pushed out starting from the bottom-left corner. We will propose here a method to guarantee an idempotent mapping by virtually expanding specic modules. Therefore, we call this approach the expansion method. For the sake of simplicity, but without loss of generality, the algorithm is discussed for one dimension, i.e. the horizontal expansion case. Expansion uses the steps as shown in Figure 6.44. Note that the placement is assumed to be converted into an equivalent corner-stitching data structure (see Chapter 5). As a result, every piece of space in the packing is represented by an empty rectangle or a non-empty rectangle (module). Clearly, the algorithm has compu1. 2.
3. Check if all touching rectangles, found by neighbor enumeration with hint, to the right of are empty. 5. If not, then expansion is not possible for module (the module is tight). 6. 4. If so, then expand with the horizontally smallest rectangle size.
. .
If
Figure 6.44: An algorithm for expanding modules in one dimension. tational complexity equal to , since the modules are processed in an efcient order given by , and step 3 is performed with amortized computational complexity . Note that an inefcient processing order of the modules could easily result in complexity. This occurs, for instance, when we need to search back and forth in the corner stitching data structure for certain modules. Figure 6.45 shows the packing of 10 modules after (horizontal) expansion, which is the equivalent packing of Figure 6.40. Note that in most cases there
2 4
8 5 7 0 9 6
Figure 6.45: A (horizontally) expanded packing of 10 modules with SP ((2, 3, 4, 5, 6, 8, 1, 7, 0, 9 ), (0, 5, 7, 9, 6, 4, 2, 3, 8, 1 )). will be no slack space left after full expansion of a packing in two dimensions. The reader can easily verify that expansion implies that algorithm P2SP is equivalent to an idempotent packing-to-sequence-pair mapping. The alternative approach, as mentioned earlier, is to guarantee an idempotent packing-tosequence-pair mapping is by rening algorithm P2SP such that ambiguity is prevented from occurring by making explicit use of the empty rectangles. This is not further elaborated in this thesis.
116
Placement
A constrained block adheres to its range constraint if it lies inside that region, otherwise it violates its range constraint. Denition 23 (Boundary constraint) A block has a boundary constraint if the block is to be placed at one of the four sides of the packing area. A corner constraint is also a boundary constraint. In the former case the module is constrained to two adjacent boundaries. Some authors use a pre-placed module constraint, but this constraint can be seen as either a special case of a range constraint or a boundary constraint. For example, if in case of a range constraint the range is set to the actual width and height of a module, effectively the module is pre-placed. Note that a pre-placed module can also be seen as an obstacle in the placement area [46]. We will discuss hereafter the incorporation of range and boundary constraints into both the non-graph-based placement computation approach and the graph-based placement computation approach. The original idea for the former approach was introduced by Tang and Wong [47] very recently. Also, the incorporation of range and boundary constraints into the incremental graph-based placement computation approach is discussed. Note that matching does not fall under range constraints because a range constraint requires a priori knowledge of absolute placement information. This information is not always available for matched modules. However, as discussed in Chapter 4 matching constraints can be taken into account using techniques such as described in [45].
117
(a)
(b)
Figure 6.46: (a) Range constraints are enforced by essentially four types of dummy blocks for each placement-constrained block. The dimensions and depend on the (desired) total chip width and height . (b) Boundary constraints are enforced by four types of dummy modules which have dimensions that depend on the (desired) chip dimensions and .
and height 0, one to the right of with width and height 0, one at the bottom side of with height and width 0, and one at the top side of with height and width 0. Note that depends on the height of the chip, and depends on the width of the chip. Actually these values are pre-set desired values and adapted during optimization. In [47] this issue is handled by dening initial values for both and such that equals 150% of the sum of the individual block areas. During optimization by means of simulated annealing both or either one of and are randomly chosen to be decreased by a certain amount when a constraint-violation-free placement is found. Furthermore, the cost function used in [47] is dened as
(6.42)
where and are the actual width and height of the chip area, stands for wire-length, and and are weight factors. Ideally, at the end of the optimization process, and . However, it is clear that during optimization, and frequently occurs. In such a case we have the situation that the dimensions of the current placement do not comply with the desired dimensions and . A straightforward solution would be to penalize such cases with very high cost values. A direct consequence of this measure is that the cost landscape is rendered unnecessarily irregular. This, in turn, badly affects the convergence behavior of the simulated annealing algorithm. Fortunately, there is a more and , we can simply elegant solution. Since by construction we have put and directly in the cost function of (6.42). Minimizing the cost function implies minimizing and . This approach seems to work remarkably well as evidenced by the
118
Placement
results in [47].14 For completeness, and for the sake of overview, we give the general non-graph-based placement computation algorithm of Figure 6.17 again in Figure 6.47. We call it the constrained maximum-weight common subsequence (CMWCS) algorithm. The essential difference is located on lines 12 and 14 which enforce the placement constraints for the horizontal case. The vertical case is similar. Line 12 checks if module has a constraint by verifying (in
Input: sequence pair , element weights and target dimension Output: realized width of the chip area and (partially) constrained () positions of all modules
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
for
to do predecessor if null then else if then if then insert successor while null do if then delete else break successor
predecessor
od od return
Figure 6.47: The non-graph-based placement computation algorithm, which is essentially a constrained maximum-weight common subsequence (CMWCS) algorithm, directly handles range and boundary constraints on blocks. The essential difference with the general algorithm of Figure 6.17 lies in lines 12 to 14 in which the placement constraints are enforced. This algorithm applies to the placement computation for the horizontal direction. , constant time) whether or not it is an element of the set of constrained modules which consists of the range-constrained modules and the boundary-constrained modules . Of course, .15 Line 2 assigns the correct values to left- and right-side constraints associated with a range-constrained module. Lines 3 and 4 establish similar results for left-boundary and right-boundary constrained modules, respectively. If a module is constrained then the position of module is adapted such that the constraint is adhered to. Essentially, the dummy modules associated with the constraints force the module-underconstraint into a certain preferred region of the placement area. At line 12 a left-side con14 Unfortunately, the authors of [47] did not reveil any detailed information on their implementation of the simulated annealing algorithm. Hence, comparison with results of other works should be done with care. 15 Note that the top and right chip boundary locations are unknown beforehand.
119
straint on a module is enforced. At line 14 we check again if is a constrained module. This time the width of the total chip area is adapted in such a way that a violation of the right-side constraint will induce larger chip width. Hence, violations are penalized and thus minimized. The overall computational complexity of algorithm CMWCS can be derived in a similar fashion as algorithm MWCS shown in Figure 6.17, being , where is the amortized computational complexity of the priority queue operations. If we use a priority queue based on splay trees we obtain , and if we use a Van Emde Boas data structure [58, 59] a better result, , is obtained. We observe an important point which is worthy of further investigation because it can lead to a simplied overall optimization algorithm and give better placement results. The observation is that the use of stochastic adaptation of the target dimensions adds to the computational complexity of the problem. Moreover, the impact of this stochastic adaptation on the overall performance is not known. Therefore, it is probably better to avoid it. We propose a modied 2-step algorithm to compute a constrained placement without the use of iterative adaptation of and . The algorithm is shown in Figure 6.48. Essentially, the algorithm, which we denote by MWCS2, is the same as the original algorithm except for the fact that the input target dimension is not needed anymore. The algorithm nds this dimension by performing a rst placement pass (from line 1 to 16). The found value of guarantees that no redundant margin is introduced. Consequently, no estimation error is made which will eventually lead to better results in less iterations. Experimental results which conrm this claim are presented next. Note that the boundary constraints can be easily generalized to include corner constraints. This is accomplished by enforcing both a top or bottom constraint and a left or right constraint. Experiments show that neither the run-time performance nor the solution quality deteriorate (under a reasonable number of constraints).
120
Placement
Input: sequence pair and element weights Output: realized width of the chip area and (partially) constrained () positions of all modules
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
for
to do predecessor if null then else if then insert successor while null do if then delete else break successor
od
od
for
to do predecessor if null then else if then if then insert successor while null do if then delete else break successor
predecessor
od od return
Figure 6.48: The modied non-graph-based placement computation algorithm, which is denoted by CMWCS2, does not require the input of a target dimension which is a tunable input parameter.
121
49 blocks with a diverse set of dimensions. Similar to the latest state-of-the-art publication on constrained placement [47], we set the following seven constraints. We select a block to be constrained to one of the four boundaries, and do this for each boundary. Moreover, we select three blocks to be constrained within the same preselected placement area. Furthermore, an attempt was made to choose blocks of the same size as in the reference publication but due to a different block labeling scheme there might be some inconsistency here. However, a small inconsistency will not affect the results signicantly. We ran both CMWCS and CMWCS2 20 times on ami49. Since we still have a tuning parameter in CMWCS, embodied by the decrement size of the target dimension(s) each time a constraint-violation-free placement is seen, we compare the CMWCS2 results with the best results from a series of CMWCS runs with different values of the decrement parameter. The set of values we used for this parameter are: 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.98, 0.99 and 0.9999. More specically, when a constraint-violation-free placement , with is found, the chosen target dimension, say , is adjusted as follows: the decrement parameter equal to 0.98. Figure 6.49 shows the behavior of the solution quality over 20 runs of the SA algorithm for various values of the decrement parameter. Both the average and the best solutions in terms of chip area are plotted. For comparison purposes, the average and best chip area obtained by algorithm CMWCS2 is also plotted in the same gure. We can see that algorithm CMWCS2 consistently yields solutions close to the best solution. Moreover, the best CMWCS2 solution is typically better than the best solution over all 220 runs of the CMWCS algorithm. In all cases the average CMWCS solution is signicantly worse than the average CMWCS2 solution. It should be noted that a more honest comparison should compare 220 runs of algorithm CMWCS2 with the best solution obtained with algorithm CMWCS. Indeed, from additional experiments we obtained a value of 36.25 mm after 80 runs which is already better than the best result obtained with algorithm CMWCS. Consequently, we may state that algorihm CMWCS2 is very robust, does not require (problem-dependent) tuning, and yields excellent solutions. However, as is clear from the structure of the algorithms, CMWCS2 in its present form is bound to be signicantly slower than CMWCS. Indeed, this is conrmed by Table 6.5 which summarizes some additional information gathered from the experiments. When compared to the unconstrained Table 6.5: Experimental results of the original constrained placement computation algorithm (CMWCS) and the proposed improved version (CMWCS2). The averages are taken over 20 runs with random seeds.
algorithm CMWCS CMWCS2 average #iterations 564776 718405 average #rejections 364908 469392 average CPU time [s] 89.00 221.56 average CA [mm ] 37.14 36.75 average slack space [%] 4.56 3.55
optimization results (see Table 6.4), there is no apparent degradation in solution quality due to the imposed constraints. This is quite surprising. It implies that the taken approach for including placement constraints does not (signicantly) deteriorate the convergence properties of the overall SA optimization algorithm. The CPU time of CMWCS-based optimization is substantially smaller than CMWCS2-based optimization. This is the only drawback of algorithm CMWCS2. However, it may be possible to improve the latter algorithm by using sophisticated techniques. For instance, one could try avoiding recomputation of module positions which are not affected by constrained modules. This is not further explored in this thesis.
122
Placement
52
50
46
44
36 0.2
0.3
0.4
0.8
0.9
Figure 6.49: A graph which shows the dependency of the nal chip area of constrained placements obtained with algorithm CMWCS on the decrement parameter value. The problem instance under test is ami49. Each CMWCS value is best or average of 20 runs. The horizontal solid and dashed line are average and best solutions, respectively, of the CMWCS2 algorithm over 20 runs.
The average number of iterations and rejections (of generated moves) gives a good indication of the quality of the overall optimization algorithm and enables platform-independent comparison of optimization times. An important feature which favors CMWCS2 over CMWCS is the fact that the former does not introduce additional and unnecessary tunable parameters which, among others, adversely affect the stochastic properties of the optimization. Additionally, the tunable parameter settings are likely to be problem-dependent. When we spend some more effort on the generation of a few additional constrained placement results, we can for instance obtain with CMWCS2 the placement shown in Figure 6.50. The result is signicantly better than the current state-of-the-art [47] with reported slack space of 6%. It is interesting to note that the latter result is obtained with about 850,000 iterations and 600,000 rejections [95]. If we constrain module 4 to the top-right corner and module 6 to the bottom-right corner, and set the constraints on the other modules as before, a run of the SA optimization algorithm could give the placement result shown in Figure 6.51.
123
16
4 33
34 12
9 1 27 20 26 31 21 42 36 8 5 15 48 38 10 46
39
17 35 44
13
14 41 28 6
40
18 45 43 2 30 22
47 25
24 11 3 23 7
32
0 29 37 19
Figure 6.50: A placement with modules 32, 4, 6, and 0 constrained to the boundary (left, top, right and bottom, respectively), and modules 2, 3, and 5 constrained within the same rectangular range indicated by the dotted lines. The chip area is 36.25 mm with 2.22% slack space.
41 43 1 16 32 9 5 37 35 34 40 46 29 3 18 13 2 25 31 39 12 15 22 20 38 26 14 19 11 24 36 44 17 33 45 48 4
7 47 23 0 27 42 10 8
28
21
30
Figure 6.51: A placement with modules 32 and 0 constrained to the left and bottom, respectively. Modules 4 and 6 are constrained to the top-right and bottom-right corner, respectively. Modules 2, 3, and 5 are constrained within the same rectangular range indicated by the dotted lines. The chip area is 36.77 mm and the optimization time is 204 CPU seconds.
124
Placement
We extend this idea to the incremental computation scenario. As before we perform incremental computations directly on the constraint graph representations. The incorporation of placement constraints into the incremental approach is quite straightforward. For simplicity only the horizontal case is discussed here. A module which has an associated constraint, i.e. , is only processed when it is an affected module. The distance update step for a constrained module is
(6.43)
which is very similar to the recompute dist () function which is used for the incremental computation of longest paths. For the dummy end node in the constraint graph, the following update step is sufcient:
(6.44)
is right-boundary constrained
However, (6.44) can be updated more efciently when the affected range-constrained and right-boundary constrained modules are stored in a separate list which keeps both the old information as well as the new information. As a result, (6.44) can be computed incrementally. Thus the average computational complexity of incremental graph-based constrained place ment is upper bounded by . If the number of constrained modules is xed and independent of the total number of modules , it is easily seen that the placement computational complexity is not affected due to the addition of constraints. Experimental verication of the effectiveness of incremental constrained graph-based placement computation has not been performed. Based on the experimental results of the constrained non-graph-based placement computations, it is expected that the computational complexity is not signicantly affected. It is plausible to assume that previously established properties may be extrapolated to the current situation.
125
Efcient near-optimal incremental placement computation algorithms have been proposed and implemented. Experimental results demonstrated the validity of the theoretical analyses and have shown the feasibility of the incremental approach. However, we note that in a practical framework the usage of both incremental and non-incremental approaches should be considered depending on certain features of the stochastic optimization engine. Since the non-incremental approach is less sophisticated it has smaller constant factors, hidden by the big-Oh notation, as compared to the larger constant factors due to the elaborateness of the incremental algorithms. The difference, however, can be further reduced by optimizing the implementation. We have shown that range and boundary constraints imposed on arbitrary modules can be easily taken into account without incurring computational complexity overhead. A modied algorithm is proposed which is easier to implement, is very robust and consistently yields signicantly better solutions than those in current literature. Furthermore, it is shown how the idea of constrained module placement can be easily transferred to incremental placement computation.
126
Placement
Chapter 7
Routing
This chapter covers several aspects related to routing. The routing process is that part of the physical design step with the task of laying out the interconnect between preplaced geometrical structures, as dened in a point-to-point manner by the circuit netlist. The interconnect consisting of wires and vias of a certain net, is also called the routing of that net. Placement without considering routing in a proper qualitative manner makes only sense in connection with designs where the quality of interconnect has negligible effect on system performance. Normally, these are low-performance designs. Contemporary state-of-the-art mixed-signal integrated circuits require high-quality layouts which are robust in any sense. With increasingly higher operating frequencies into the gigahertz range, and feature sizes going far into the ultra-deep submicron range, routing issues are becoming indisputably dominant. As a consequence, placement should take all quality aspects connected with routing into account. Unfortunately, a routing cannot be computed without having an idea of where the objects that have to be routed are located. But then, how can we nd a good routingaware placement? This problem asks naturally for an iterative approach. See Chapter 4 for a global overview of how placement and routing are integrated into an iterative optimization framework. First, the routing problem is dened exactly. Then, we give a general classication of routing approaches which facilitates categorization of relevant works. This is followed by a brief discussion of relevant previous work. A brief discussion on computational complexity is presented thereafter. We proceed by dening a routing model and routing algorithms which are most promising within a mixed-signal layout generation context. Based on experimental results, we will choose for a routing heuristic that has best performance compared to other heuristics, relative to optimal routing solutions for a broad range of problem instances. The selection criteria are based on both run-time performance and routing quality. Also, emphasis is put on incremental capabilities of the adopted routing methodology consisting of a fast and effective graph-based routing heuristic in combination with an efcient irregular-grid routing model. For the chosen heuristic, we discuss extensions for incremental computations. The incremental routing heuristic then is evaluated and experimental results are reported. Finally, the overall routing methodology is integrated into the iterative optimization framework. Experimental results of the integrated placement and global routing approach are given and compared with existing state-of-the-art works. Furthermore, discrepancies in current works are exposed and discussed. The viability of the adopted methodology is further demonstrated by pinpointing and discussing areas for improvement. Finally, we end with some concluding remarks.
128
Routing
Routing in higher level metal layers requires the use of additional vias which are expensive in terms of parasitics and yield. If intellectual property blocks are used in a module, parasitic interaction due to crossing wires with those blocks is best avoided because it is unknown how the circuit performance will be inuenced.
Furthermore, we assume that an interconnecting network of wires of which no wire runs over a module, and with minimal total length, is optimal.1 Moreover, we adopt a rectilinear wiring model in which wire segments are only allowed to go in either horizontal or vertical direction. Such a network is called an obstacle-avoiding rectilinear Steiner minimal tree [96, 19]. Finding such a tree is an NP-hard problem [97, 19]. A standard method to represent Steiner trees uses graphs, where the nodes represent junctions, bends, or crossings and the edges represent possible wiring segments. The graph approach can be applied without loss of generality thanks to Hanans theorem stating that a Steiner minimal tree exists in a Hanan grid [98]. The Hanan grid is a rectilinear grid in which the grid lines are induced by the pins, and their crossings form nodes in the graph. Naturally the line segments are the edges in the graph. Formally stated the global routing problem is as follows. Problem: Steiner minimal tree (SMT) in a graph (GSMT) Instance: Solutions: Minimize: and a set of pins that form a net a weight function. All trees that connect the elements of in . , where .
A graph
, and
with
The nodes in the graph are represented by the set . The edges in the graph, represented by set , are undirected. The pins are also called demand nodes in this context, whereas the other nodes in are called Steiner nodes which are candidate nodes for the trees in . There are two special cases which have polynomial-time complexity. The rst case is . This case is also known as the single-pair shortest-path problem. Dijktras algorithm [99] can be used to solve it in time , for instance with a Fibonacci heap data structure. Here and are the set of edges and nodes, respectively, which are contained in the equivalent graph enclosing all modules in the rectangular region dened by the modules attached to the 2-pin net at hand. A more efcient target-directed path search algorithm called can be used to nd an optimal path between two pins in time proportional to the number of edges (and nodes) on the path [100, 101]. The other special case , where all the nodes in the graph need to be connected in a minimal occurs when
1 Symmetry
129
sense. This is called the minimum spanning tree (MST) problem, and it can be solved in with Fibonacci heaps using Prims algorithm [102]. In appearance, Prims algorithm is very similar to Dijkstras algorithm. The fundamental difference is that the former stores the edge weights associated with candidate extension nodes on the heap, while the latter stores the path lengths associated with candidate shortest-path extension nodes on the heap. Unfortunately, all other cases are known to be NP-hard. This even holds for many conceptually simplied versions of the Steiner minimal tree problem. For instance, the Steiner minimal tree in planar rectilinear graphs is NP-hard [19].
single-step versus two-step approach This is a classication based on hiercharchy. The two-step approach essentially adopts a hierarchical divide-and-conquer strategy2 in order to (conceptually) simplify the problem and eventually nd good solutions more efciently3 . regular-grid versus irregular-grid approach This is a classication based on the efciency and effectivity of information representation; higher efciency means less redundant information in the representation, and higher effectivity means that (more) higher quality solutions can be found using the representation. An advantage of the regular-grid approach is its conceptual and practical simplicity. However, it is mostly very inefcient in terms of space and time requirements.
In our opinion it is advantageous to separate the routing model and the routing algorithm explicitly. Consequently, both items can be constructed, analyzed and improved separately. Also, we gain more insight into the properties of both items. In the context of mixed-signal layout generation we choose for a two-step routing approach based on an irregular grid model. The reasons for this choice are as follows (rst two reasons why a 2-step approach is preferred, followed by two reasons why an irregular-grid model is preferred):
2 The difference with the classical divide-and-conquer algorithm is that we do not impose a uniform conquer strategy. 3 In fact, the classication could be generalized into multi-step versus single-step but even in those cases the bisection is dominant.
130
Routing
A two-step approach consisting of a global routing step followed by a detailed routing step enables controllable routing quality renement which is advantageous in an iterative optimization framework such as simulated annealing. A two-step approach mitigates the problems in connection with properly handling all routing-related issues, which typically have complex interdependencies, at the same time. By introducing hierarchy, the problems can be made easier manageable. A routing strategy requires a routing model which has as little redundant information as possible and, at the same time, guarantees the existence of an optimal solution. An overall efcient integrated placement and routing approach requires a low-complexity algorithm to compute necessary routing information from a given placement.
Furthermore, a very important additional requirement is an efcient update mechanism of (small) dynamic changes in the graph. For instance, after a small change in a placement due to a perturbation operation, it would be a waste of computation time to re-compute the whole routing graph again from scratch. Clearly, a signicant gain is possible if the routing graph can be efciently updated in an incremental sense. Below, the classications based on routing hierarchy and routing model are discussed more in depth. Note that a choice with regard to the routing hierarchy is essentially independent of a choice with respect to the routing model.
131
It is very difcult to predict whether or not a net is routable with specied quality margins, due to previously routed nets which form obstacles for succeeding nets. Consequently, it is almost impossible to solve the routing problem adequately, i.e. to nd near-optimal solutions for all nets with respect to the input specications, for most but the simplest problem instances. It is difcult to evenly spread the wires over the chip area while at the same time targeting good solutions for all nets. Computational complexity increases very rapidly due to forced ripup-and-reroute strategies in order to achieve compliance with specications. Furthermore, the computational complexity is very hard to analyze and bound. From an algorithmic point of view, it is difcult to comprehend the impact on the output of a routing algorithm as a function of tunable algorithmic parameters. As a consequence, possible improvements are based mainly on a trial-and-error basis, which can overshadow possible fundamental improvements based on scientic insight.
As a side note it is questionnable whether xating on details, while the global line has not been formulated yet, is a fundamentally sound approach. We argued that for large circuits the area routing approach is infeasible because of the complexities that are involved in managing all details on a global level. However, regional area routing approaches can give excellent results when the size of the area that is routed in a single phase is bounded in size [103]. A practical implication of this fact, is the importance of dening manageable regions for area routing. Essentially, this is accomplished by means of hierarchy. Two-step Routing An approach that uses hierarchy to split the overall problem into conceptually easier to grasp sub-problems is the well-known two-step routing approach consisting of global routing followed by a detailed routing step. Important advantages of two-step routing are as follows.
The optimization of global routes is conceptually simpler without considering detailed routing aspects and the optimization of detailed routes is conceptually simpler without considering global aspects. Algorithm design and analysis is simpler for spatially conned detailed routing problems. Moreover, the quality of routing results is easier assessed as a function of algorithmic features.
In the context of an iterative optimization framework, the possibility to trade off runtime against solution quality is of paramount importance; typically, detailed routing becomes increasingly more important when the global routes are approaching the status of being good enough. An example scenario is appropriate to illustrate the idea of hierarchical routing. Imagine that the total layout area is divided into segments by overlaying a grid on the plane. During global routing it is determined through which grid cells of the layout area, a global route will go. Every grid cell has an associated capacity which limits the maximum number of
132
Routing
global routes that can pass through that cell5 . The actual number of routes through a cell is called the demand. When the demand is larger than the capacity, we speak of routing congestion. Typically, more crucial nets are routed rst, followed by less important nets, in order to satisfy imposed timing or wire-length constraints. After all global routing congestion has been resolved the detailed routing step is started. During detailed routing, the actual geometric location of each wire is computed, guided by the global routing information. Generally, channel and switchbox generation is needed before the detailed routing step. It can occur that detailed routing is impossible with the given placement and global routing information. In such cases some of the global routes are removed and, with adjusted constraints, re-routed. This classical approach is called ripup-and-reroute. We stress here that the aforementioned hierarchical approach serves only to illustrate the idea and is not the approach we advocate in this thesis. From the previous discussion it is clear that the best choice for mixed-signal layout generation is a two-step approach. Note that the routing model that is used in both steps need not be equal.
133
is that it is precise in terms of wire locations. The latter renders maze routing also a useful candidate for detailed routing. The plane space can also be used directly to nd solutions to a routing problem, without considering each grid point or grid tile on a (partial) path separately. Finding an interconnecting network for multi-pin nets, can be performed by solving the rectilinear Steiner minimal tree (RSMT) problem. Each (grid) point in the plane is then a candidate Steiner node. Hanans theorem tells us that it sufces to consider only the grid points which overlap with the Hanan grid [98]. Since the RSMT problem is NP-hard [97], we can also use an approximation algorithm to approximate an RSMT in , where is the number of pins to be connected [107]. Note that the complexity is independent of the grid size. Because the RSMT only considers pin locations and Hanan grid points (which generally have little in common with modules in the plane), it ignores obstacles (represented by modules) in the plane. Therefore, it violates our non-over-the-cell routing requirement. However, as we shall see in Section 7.7 RSMT routing solutions are typically less than 6% away from obstacle-avoiding routing solutions for a wide range of routing instances. Irregular-Grid Routing Model Motivated by the massive memory requirements of grid-based routing and the inability of plane-based routing to take obstacles into account (while guaranteeing a (near) optimal solution if it exists), researchers have thought of ways to minimize memory usage without precluding optimal or near-optimal routing solutions. Routing based on graphs, has been shown to be very efcient in terms of computational effort. Furthermore, general graph theory is a broad and active eld of research, with many useful techniques and algorithms that can be exploited. The graph-based routing approach relies on the proper denition of nodes in the graph which correspond to locations in the plane. The edges in the graph are used for routing. Each such edge represents a routing path segment which can be used for interconnecting a set of pins. Furthermore, weights can be associated with each edge to denote its importance, Manhattan length, capacity, or a combination thereof. In the extreme case where each node in the graph is a grid point and vice versa, the graph-based approach degenerates into a grid-based approach. The efciency of the graph representation is directly affected by the efciency of the grid representation since, typically, the computational complexity of a graphbased approach is expressed in number of nodes and edges in the graph. Intuitively it is clear that an irregular grid is more efcient than a regular grid, since the former uses a denser grid at locations in the plane where it is needed and a sparser grid at locations where it is allowed. A practical bottleneck of this non-uniform manner of information representation is to nd out where and how much this grid is to be sparsied. Fortunately, efcient methods exist to perform this task, one of which is proposed in this thesis in Section 7.5.2. As a result, an irregular-grid routing approach is preferred over a regular-grid routing approach.
134
Routing
this section, we will provide a global overview of previous works in the routing eld that are considered of interest to mixed-signal designs. Especially those methodologies that apply to analog circuit routing are eligible candidates for application to mixed-signal layout generation. However, we do not conne ourselves to solely analog routing methodologies, a priori, since there are no fundamental reasons why methodologies used in the digital domain cannot be useful in the mixed-signal domain. Typical approaches in analog routing are based on area routing. That is, the exact wiring pattern of each net is determined, taking into account previously routed nets and analog constraints such as parasitic resistances, capacitances, and crosstalk. This approach is advocated in the works of Cohn et al. [6], Lampaert [8], and Malavasi and Sangiovanni-Vincentelli [108]. A promising approach in which the current through a wire is also taken into account to size the widths of interconnecting wires is reported by Adler and Barke [109]. Other area routing approaches, from a digital point of view, are described by Tseng [103]. The works that advocate an essentially two-step routing approach, with a rened detailed routing step that incorporates or can incorporate timing, crosstalk, and parasitic , are [41, 103, 110]. Generally, when a strategy is rened so as to take into account additional constraints related to performance degradation due to routing, it is denoted by performancedriven routing [111, 112, 3].
135
thesis, but merely mention its importance to arrive at the nal layout.
8 6 7 9
(a)
(b)
(c)
Figure 7.1: (a) A packing of ten blocks and (b) the derived global routing graph . After inserting pins of a net we get (c) the extended global routing graph . global routing graph, denoted by , must be extended for each net to include the pins of that net and their escape line segments. An extended global routing graph, shown . in Figure 7.1(c), is denoted by . Of course, and Computing is not trivial. Cohoon and Richards proposed a line-sweep method which yields an algorithm. It is not clear from their approach how pins and their associated escape segments can be dynamically inserted and deleted from the escape graph. Since construction of a static global routing graph is of limited use in the present context, we propose a new dynamic method based on corner stitching and a hash table.
136
Routing
any substantial gain for our purpose. Another important issue is the ability of the global routing graph to represent effective solutions, i.e. solutions which are very close or equal to an optimal solution. Cohoon and Richards already showed that an optimal shortest path between any two nodes in the escape graph always exists [113]. More recently, Ganley proved the following interesting theorem [96]. Theorem 8 The extended global routing graph
This result is of interest when we want to nd an optimal solution for a nontrivial multipin net where the number of pins is larger than two.
1. Convert the sequence pair to a packing using constraint graphs. 2. Convert the packing to an equivalent corner-stitching data structure [52], incorporating all escape line segments. 3. Segment the perimeter of the corner-stitched modules into horizontal and vertical line segments and sort them with increasing and coordinates, with as the primary sort key and as the secondary sort key. 4. Sequentially insert the nodes and edges implied by the line segments into a hash table that holds the global routing graph. Nodes are inserted explicitly (based on their coordinates), and since each node can have at most 4 incident edges, all incident edges are kept explicitly within a node.
Figure 7.2: Construction of the global routing graph. tional complexity of step 1 is determined by the packing complexity, which is at best for a from scratch computation (see Chapter 6). For step 2 we have to insert each module sequentially into a corner-stitching data structure. All absolute module positions are known, so we can insert the modules in using the depth-rst search order of the modules in, say, the horizontal constraint graph. When all modules are put into an equivalent corner-stitching data structure, the exact locations of all line segments in the placement are known. Furthermore, due the maximally horizontal empty tile property of corner stitching, all horizontal escape line segments are generated automatically. Step 3 comprises the enumeration and segmentation of each empty and non-empty tile in the corner-stitching data structure. The required complexity is , where is the sum of empty and non-empty tiles. Since there are non-empty tiles, . In the worst case , but typically . The underlying reason is that in a typical placement, each module will be shielded by a number of surrounding modules from the rest. As a consequence, a typical escape line affects only a bounded number of surrounding tiles. Hence, the independence . Therefore, sorting of the line segments can typically be performed in of for any .6 Finally, in step 4, all line segments can be traversed sequentially, each pair
6 Theoretically, this can be reduced to with a sorting algorithm such as bucket sort [34], under the assumption that the distribution of the line segments has a (uniformly) random behavior. This results in a linear overall complexity of the algorithm.
137
generating at most three edges in . The latter property can be easily seen from the example segmentation of the right-side segment of module 5 and the left-side segment of module 6 in Figure 7.1(a), drawn with a dotted line. The resulting edges are denoted in Figure 7.1(b) by , and . Insertion in the hash table requires per node, with at most four edges being implicitly represented within each node. Summarizing, the whole procedure can be performed in .
Figure 7.3: (a) A part of the global routing graph with a pin and its escape line segments. (b) The intermediate result after deleting escape segments and , and pin . (c) The global routing graph after deleting pin and its escape line segments. segments (edges): , and (Figure 7.3(a)). When we want to delete pin , rst its incident escape edge has to be deleted, i.e. edge . Then we can delete the node associated with pin . After deletion of nodes, the two perpendicular edges ( and ), if they exist, must be joined again into a single edge, unless the node at position must remain there because it is a module corner point or induced by some module corner. If, after deleting , we have not hit an obstacle yet, we delete another edge and its (lower) incident node (Figure 7.3(b)). Finally, after deleting edge and its (lower) incident node, we notice that the other incident node is connected to a module. Thus, an obstacle has been found and the nal node is deleted, after which the split edge ( and ) is restored again to its original condition (Figure 7.3(c)).
138
Routing
Insertion of a pin and its escape line segments into requires much more thought since we do not know where an escape segment might cross another line (edge). At such a crossing, the edge needs to be split and a node has to be inserted. Fortunately, the corner-stitching data structure can mitigate the problem because it can nd a closest point in with a hint7 , and going to a neighboring edge also takes . Thus, constructing after inserting a pin requires complexity essentially proportional to the number of escape line segments induced by the pin. And, following the same line of reasoning as above, this number is typically a constant. The total computational complexity required for performing all insertion and deletion steps is proportional to the number of generated escape line segments during insertion of the pin. From experiments with randomly generated placements, we found out that the number of nodes is usually smaller than 15 times the total number of modules in the placement, and that the number of edges is usually smaller than 30 times the number of modules. Since randomly generated placements are normally quite sparse (containing a lot of slack space), global routing graphs associated with optimized placements are much smaller. Figure 7.4 gives an impression of the number of nodes and edges in global routing graphs derived from randomly generated placements for a wide range of instance sizes. It is clear from Figure 7.4(a) that the
8 x 10
4
1.85
1.8
# edges / # nodes
1.75
1.7
1.65
1.6
1.55
500
1000
1500
2000
2500
3000
1.5
500
1000
1500
2000
2500
3000
(a)
(b)
Figure 7.4: (a) The number of edges and nodes in the global routing graph for a wide range of randomly generated placements is an approximately linear function of the placement instance size . (b) The ratio of number of edges and number of nodes as a function of the placement instance size converges to approximately 2. number of nodes and edges in increases linearly with the placement instance size . Furthermore, Figure 7.4(b) shows that the average ratio of number of edges over number of nodes becomes approximately 2 for increasing . This observation is important enough to put in a Claim. Claim 1 The size of the global routing graph modules in a random packing.
7 In
this case a good hint would be the pointer to the last found module.
139
A direct consequence is stated by the following Corollary. Corollary 3 The number of nodes and edges in a connected subgraph of is a linear function of the number of modules in the associated conned routing region. Based on these results, we may conclude that pin insertion and pin deletion (including escape line segment processing) takes essentially constant time, on average. Dynamic Placement Change A concern in a dynamic environment where placements change often, is the complexity of updating after a placement change. Fortunately, the corner-stitching data structure allows for dynamically inserting and deleting modules in an efcient way. Clearly can be updated directly when a module is inserted into or deleted from the CS data structure. This requires some localized operations on , which takes essentially with the aid of corner-stitching operations. A way to minimize computational complexity is to use a doublewave technique. The rst wave clears the way for the second wave by deleting all affected modules. As soon as enough space has been cleared by the rst wave, the second wave rebuilds the placement by inserting modules. This idea is shown in Figure 7.5.
unaffected region unaffected region unaffected region rebuilt region
affected region
cleared region
cleared region
(a)
(b)
(c)
Figure 7.5: (a) After a placement change occurs, this can lead to a set of affected modules located in the affected region. (b) This region is cleared by a clearing wave , directly followed (c) by a rebuilding wave that inserts affected modules at their new locations, until the affected region has been fully rebuilt.
140
Routing
In the following subsections we give an overview of several existing heuristics, where we explicitly distinguish (exact) algorithms for 2-pin nets in Subsection 7.6.1 and (heuristic) algorithms for multi-pin nets, containing at least 3 pins, in Subsection 7.6.2 through Subsection 7.6.5. We also propose a few modied versions of these heuristics. The performance of these heuristics is compared with optimal solution values on a broad range of synthesized problem instances. The optimal solutions are obtained with the help of several state-of-the-art programs that incorporate advanced techniques [115, 13]. The synthesized problem instances are directly derived from actual sequence-pair-based placement results of randomly generated block sizes and nets. We choose the best-performing heuristic as the basis of a cost-constrained pin-to-pin global router. Furthermore, the heuristic is modied to incorporate incremental routing of partially changed routing segments which can be, for instance, a consequence of placement-induced changes in the routing graph.
The SSSP algorithm nds shortest paths between a single source pin and all (reachable) other nodes in the routing graph, whereas the algorithm nds a shortest path between a single source pin and a single target pin of a 2-pin net, thereby avoiding the exploration of a huge amount of irrelevant nodes. The SSSP algorithm does not use a priori information on the location of the target pin, whereas the algorithm owes its efciency to its target awareness.
We discuss both aforementioned algorithms briey because they are used quite extensively in our framework. For instance, in the multi-pin net routing algorithms, two-pin routing problems are encountered and solved iteratively. Details will be given shortly. Single-Source-Shortest-Paths (SSSP) Algorithm A general-purpose version of Dijkstras SSSP algorithm explores each node in the graph that is input to the algorithm. In case of a pin-to-pin path search problem it makes sense to stop when a shortest path connecting the two pins has been found. Therefore, we propose a modied version of Dijkstras algorithm which is shown in Figure 7.6. The difference is essentially due to the connement of routing space. Whereas Dijkstras original algorithm processes all nodes and edges in the graph, our modied algorithm processes only the nodes and edges within the a priori dened relevant region. Hence, the computation time is signicantly reduced. In the remainder of this thesis we will refer to the modied SSSP algorithm, simply as the SSSP algorithm, unless explicitly noted otherwise. First, we describe the basic operation of the algorithm on a more intuitive level. Thereafter we give a more formal description. Assume all node distances are initially set to a very large value. We start exploring
141
the graph from the predened source node. Each incident (weighted) edge is explored and the connecting nodes are conditionally put in a priority queue, keyed by their distance from the source node. We continue this process with the cheapest (shortest-distance) node extracted from the priority queue. Note that the extracted node might be a node that was discovered long before the expansion of the last node. One can see that the exploration of the nodes and edges occurs in a manner similar to an outward wave propagation induced by a falling drop of water at the source node location. The algorithm stops when the wave front hits the target node and a shortest path between source and target node has been established.
Input: routing graph and two pins Output: shortest paths from source pin to all nodes in
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
to be routed
initialization
od od
if then
Figure 7.6: Dijkstras single-source-shortest-paths algorithm applied to a 2-pin problem instance. The general algorithm is modied to stop as soon as an optimal path is found. Note that this does not necessarily happen upon the rst encounter of the target pin. We proceed with a more formal discussion of the (modied) single-source-shortest-paths algorithm given in Figure 7.6. All nodes in the graph are initially set to a very large distance value, except for the source pin which has its distance value set to zero at line 2. On line 3 the source pin is inserted into the priority queue . Then in the loop from line 4 to line 14 the following actions are performed iteratively until either becomes empty or the target pin is encountered. The cheapest node in the priority queue is extracted for expansion using extract min (). Expanding a node means that all its adjacent nodes are explored. If the distance value of an adjacent node is larger than the distance value of the expanding node plus the weight of edge , then node is relaxed on line 9. Relaxing a node means that its distance eld is decreased to the lowest currently known distance value to that node.
142
Routing
In case a relaxation step is performed, the relaxed node will have its parent pointer set to the expanding node . These parent pointers are useful for backtracking the shortest path nodes and edges when the algorithm nishes. Moreover, if a relaxed node is not in the queue yet, then it is inserted into on line 11. The loop from line 15 to line 25 makes sure that the found shortest path between and is indeed shortest. In order to guarantee this, the algorithm extracts the nodes in the queue which could possibly lead to a shorter path. When a node is extracted from with distance not smaller than the current distance value of , it is clear that no improvement is possible and the algorithm can stop. It can be veried that when the weight function is positive, which is the case when the weight of an edge is equal to its Euclidian length, the algorithm guarantees nd an optimal path. Furthermore, to the worst-case computational complexity is [34].
Algorithm
The algorithm is essentially a generalized best-rst search strategy. These type of algorithms are also called labeling algorithms because during the algorithm execution, status labels are attached to a node. If a node is candidate for expansion, then we label it with OPEN. If a node has been expanded it is labeled with CLOSED. However, labeling is not strictly required in an implementation, because making a node part of a set can be effectively equivalent to labeling. But for the sake of clarity and ease of discussion, labels may be very useful. We will use labeling wherever appropriate. In the rest of this thesis we assume the global routing graph is connected. Before discussing we will state some denitions. The cheapest cost of a path between two nodes and is dened by . In general we will use to indicate a function which yields an optimal value, and to indicate an estimating function. The algorithmic steps of are shown in Figure 7.7. The algorithm operates as
Input: routing graph and two pins Output: a shortest path connecting source and target node
1 2 while do 3 extract min 4 CLOSED 5 if then break 6 foreach 7 if OPEN 8 then 9 10 elsif OPEN 11 then 12 13 elsif CLOSED 14 then 15 16 17 od 18 od
to be routed
do
CLOSED
algorithm.
143
follows. Initially, all nodes are labeled INITIAL and the backtracking parent elds of all nodes are set to . The algorithm starts by labeling the source node OPEN and puts it in queue . By denition, all elements in are labeled OPEN. Then the algorithm proceeds by selecting the best node, i.e. the node with smallest distance value , from , in the sense dened by
where
(7.1)
is the sum of edge costs along the current path of pointers from to the source
node ;
is the estimate of the cheapest cost of paths going from node to the target node
.
If the selected node is the target node then we have found an optimal path and the algo then all neighboring nodes of are evaluated with respect rithm will terminate. If to their current shortest path distance to and their estimated distance to , and, if appropriate, backtracking information is updated and nodes are (re-)inserted into the queue (or, equivalently, labeled with OPEN). This procedure is repeated until the target node is found, which is guaranteed. for all then (by denition) is admissible, i.e. it is If we choose will yield an optimal solution [100, 101]. However, if we want to measure guaranteed that s effectiveness by its ability to exclude as many nodes as possible from expansion, then admissibility alone is not sufcient. As proven in [101, 116], never reopens a CLOSED node under the following consistency condition:
(7.2)
Note that consistency implies admissibility. If (7.2) holds, we have the following useful property [100]: CLOSED (7.3) which means that the backtracking path from every CLOSED node to the source node is a least cost path. For our rectangle-packing-derived global routing graph, we use
(7.4)
where denotes the Manhattan distance between the target node and . Note that the distance measure need not necessarily be Euclidian or Manhattan in general. Actually, researchers tend to use a different (nonlinear) metric based on congestion and specic circuit constraints [108, 103]. However, usually this violates the consistency property, thus enlarging the practical average complexity of . It can be veried (using case distinctions) that (7.4) preserves consistency. The average computational complexity of the algorithm is substantially better than the SSSP algorithm. Typically, the complexity is proportional to the total number of nodes on the shortest path between the routed pins. In case we have modules in the relevant
144
Routing
routing region, this results in average computational complexity using Claim 1 and Corollary 3.8 Although is better than SSSP in terms of computational complexity, it is not always suitable for nding a 2-pin shortest path. For instance, in cases where the location of the target node is unknown, cannot be used and we have to resort to the SSSP algorithm.
half-perimeter-length path
optimal-length path
(a)
half-perimeter-length path
(b)
Figure 7.8: Two illustrative examples of the coarseness of minimal-bounding-box routing estimation; (a) shows a 2-pin net, (b) shows a 3-pin net. The total wire length is calculated by summing up the half-perimeter lengths for all nets. The computational complexity of this method is given by
(7.5)
where is the number of pins in net . Since each pin is in exactly one net, the total complexity is . This computational complexity is very low. However, a major drawback of this method is poor accuracy of the estimation. Therefore, it is not appropriate for use in our optimization framework in which speed and accuracy are of utmost importance.
8 We assume here that the relevant routing region has a squarish size. If this is not the case, the average computational complexity is expected to be approximately .
145
where is the -coordinate of node and is the -coordinate of node . As we now have a graph in which the number of pins to be connected is equal to the number of nodes in the graph, Prims minimum spanning tree algorithm can nd a minimal tree connecting all which is also written as because is complete. In nodes in terms of all nets to be routed the total computational complexity is
(7.6)
Note that the above procedure does not directly yield a solution in the form of a subgraph of the original graph. Additional steps must be taken for this. Very recently, an efcient line sweep algorithm was introduced which computes a rectilinear MST of points in the plane in [117], without the use of Delaunay triangulation. The latter is a well-known method for Euclidian MST computation in , but it not well dened for the Manhattan distance measure. Unlike the MBB estimation method, the MST error is bounded. Hwang [118] proved that the ratio of rectilinear MST cost over rectilinear SMT cost is never more than 3/2. However, experimental results reported in [119] indicate that the difference between rectilinear MST cost and a solution produced by a good rectilinear Steiner minimal tree heuristic, is more than 10% on average. This implies that the difference between the MST cost and the cost of an optimal routing solution in a graph is signicantly more than 10%.
146
Routing
, say
of
.
(7.7)
, such that
Construct tree
3. If
Figure 7.9: The shortest-paths heuristic. The number of pins is . As noted by Rayward-Smith and Clare [121], the nal shortest-paths tree can be improved by two additional steps.
4. Determine a minimum spanning tree for the sub-network of
5. Delete from this minimum spanning tree all non-pins of degree 1 in a sequential manner.
147
Although the last two steps can improve the quality of the solution, it imposes a substantial increase in computation time. From experiments, we found that the improvement is usually negligible in our framework. The underlying reason is that the sub-network induced by the nodes of is not likely to contain better solutions if the pins lie relatively far from each other (in terms of intermediate nodes) and the number of alternative equally good paths is small. Generally, the shortest-paths heuristic yields good results [19, 122]. Furthermore, the worst-case computational complexity is
for a single net with pins. With the knowledge that for a planar graph the relationship holds, the worst-case computational complexity for all nets can be written as (7.8) Typically the complexity is signicantly lower because the addition of a new pin to the Steiner tree normally induces only a relatively small amount of nodes, proportional to the number of nodes in the shortest path segment, to be re-processed. Furthermore, the error ratio9 is , where is the number of pins in a net. Shortest-Paths-Based Heuristic I (SPBH I) The aforementioned shortest-paths heuristic does not add a pin to the currently built tree as soon as it nds a new pin. Instead, it makes sure that the path it adds to the current tree is indeed a shortest one over all possible paths from the current tree to this pin. A way to guarantee this condition is to postpone the addition of the currently found path to pin until all edges connected to have been explored, which implies that no improvement in path length is possible from the current tree to pin . We propose a modication of the aforementioned algorithm which entails adding a shortest path to a pin as soon as we encounter this pin. This is essentially a greedy approach. We name this algorithm the shortest-paths-based heuristic I (SPBH I). Furthermore, we observe that an implementation of the shortest-paths heuristic of Figure 7.9 typically uses a priority queue to store the candidate nodes before extracting the cheapest ones (one at a time) during the pin search. As proven by Huijbregts [123, Corollary 4.1], upon rst extraction of a candidate node from the queue to reach pin , the actual shortest path to is established through some node that resides in the queue which not necessarily is the rst extracted one to reach . Therefore, a greedy approach does not comply to the SPH condition of (7.7). In Figure 7.10 the algorithmic steps are shown which feature SPBH I. The description of the algorithm in Figure 7.10 is self-explanatory. It is clear that the computational complexity of SPBH I is never more than of the original shortest-paths heuristic. Figure 7.11 shows an example which demonstrates the different strategies of SPH and SPBH I. The purpose of this example is to show that the greedy behavior of SPBH I can yield worse solutions than the non-greedy behavior of SPH. However, as we will see from the experimental results, the
9 The
error-ratio is dened to be the quotient of the worst-case solution quality and the optimal solution quality.
148
Routing
1. Select an arbitrary pin from which to grow tree . Assign a value of 0 to the distance eld of and insert into the empty priority queue . (Every other node in has its distance eld set to and its status is .) .
2. Extract a node with minimum distance eld from the queue . Change the status of node to . Explore every node , which does not have status , adjacent to . For each then go to step 4, otherwise go to step 3. such node change the status to . If
4. Backtrack the path from node to by traversing the parent nodes. Meanwhile, add all traversed nodes and edges to . Set the distance eld of each traversed nodes to 0. . If then stop, else go to step 2.
3. If the distance eld associated with is larger than the distance eld of plus the length of edge , then relax the distance eld of to the latter sum. Tag as the parent node of and store node in the queue if it is not already there. Go to step 2.
is .
average solution quality of SPBH I is signicantly better than that of SPH for a wide range of problem instances. The explanation of Figure 7.11 is as follows. We want to connect all
36
13
3 49
12 44
39
36
13
3 49
12 44
39
36
13
3 49 56
12 44
39
36
13
3 49 52
12 44
39
Figure 7.11: The difference in routing strategy between SPH and SPBH I results in different routing solutions. Starting from the situation drawn in (a) and subsequently (b), this example demonstrates the case in which (c) SPBH I is outperformed by (d) SPH. solid black circles which constitute the pins of this net. The white circles are regular nodes. We start with pin and Figure 7.11(a) shows the snap shot of the situation in which pins and have just been processed and added to the shortest paths tree . Consequently, all pins and nodes in reside in the priority queue with their distance (key) values set to zero. Subsequently, the cheapest element is extracted from the queue and that node (or pin) is expanded. First, all nodes and pins in tree are extracted from the queue since they have a zero key value. During this process, pins and are also extracted and expanded. As a result, nodes and will be explored and put in the queue keyed by their distance from tree
149
. Thus, node
has key 13 and node has key 5. This situation is shown in Figure 7.11(b) where also the backtracking arrows are drawn. Both, and only, nodes and reside in the queue at this moment with keys 13 and 5, respectively. Therefore, node is extracted and expanded. We nd pin at distance 56 from . Algorithm SPBH I stops at this point and returns the routing solution as depicted by thick lines in Figure 7.11(c). However, algorithm SPH will not stop once is encountered for the rst time. Instead, it will nd a shortest path via node because , where is the length of path -, and is the length of path -. Clearly, the routing solution found by algorithm SPH is better than the one found by algorithm SPBH I. Despite of this pessimistic scenerio, the average performance of SPBH I compared to SPH in terms of routing quality is remarkably good, as will be shown in Section 7.7. Shortest-Paths-Based Heuristic II (SPBH II) Another variant of the original shortest-paths heuristic is as follows. Instead of determining a closest pin to the current tree by means of exploring adjacent nodes in an iterative manner, we estimate the distance to a closest-distance pin using the Manhattan distance measure. We call this variant of the original shortest-paths heuristic the shortest-paths-based heuristic II (SPBH II). The algorithm is given in Figure 7.12. By we denote the minimal
1. Begin with a subtree 2.
of consisting of a single pin and . . Find a pin in , say , such that Construct tree by adding to the previous tree, i.e. set nodes in edges in
3. If
and go to 2.
is .
Manhattan distance between all pairs of nodes , . A signicant improvement in run-time is obtained if the point-to-point routing algorithm [100, 101] is used instead of Dijkstras point-to-multipoint algorithm [99]. The shortest-paths-based heuristic II is somewhat more expensive, with respect to computational complexity, than SPH although the algorithm is fundamentally faster than any other shortest-path algorithm. The underlying reason is that a straightforward implementation of the proposed algorithm processes all nodes in the currently built tree when evaluating . Consequently, the actual computational complexity is
evaluate
(7.9)
(7.10)
150
Routing
In case the pins are spread over the entire routing graph , we obtain a worst-case scenario. However, in most cases the routing region of interest can be conned and is therefore . Note that in virtually all cases . substantially smaller. This implies that Repetitive Shortest-Paths Heuristics For comparison purposes we also cover repetitive shortest-paths heuristics. A repetitive heuristic is not innovative in the sense that it makes use of a clever technique. On the contrary, it repeats a known heuristic over a given set of initial conditions. The solution of the repetitive heuristic is then the best solution returned by the underlying shortest-paths heuristic with a certain initial condition. Essentially, the class of repetitive variants is quite broad since in each heuristic some choice is made at some point in the algorithm. This can be an arbitrary choice, a greedy choice, or a choice based on some heuristic. Of course, a repetitive approach is only practical if a signicant improvement in routing quality is obtained at the cost of a (preferably small) increase in run-time performance. The trade-off between the two depends on the application. When naively implemented, the overall run-time of a repetitive shortest-paths heuristic grows linearly with the number of repetitions. Therefore, a substantial amount of effort must be spent on clever techniques that exploit the incremental philosophy; only re-compute information when it is strictly necessary. From our point of view, the foremost reason for studying repetitive heuristics is to assess the practical improvement in routing quality. If the improvements are substantial, it proves worthwhile to consider the design of an efcient repetitive heuristic. Winter and Smith [122] have proposed a class of repetitive shortest-paths heuristics and conducted extensive experiments with them. Summarizing, this class consists of the following variants:
times, each time beginning with a different pin. times, each time beginning with a different node.
SPH-zN: determine times, each time beginning with a shortest path from . a xed pin to another pin , SPH-NN: determine between a different pair of pins.
The investigations of Winter and Smith showed that the SPH-V and SPH-NN perform particularly well on all instances from a large set of randomly generated problem instances. The quality improvements are of course paid for by longer computation times. A possible method to reduce computation time while preserving routing quality is by heuristic identication of a good start. This could for instance be accomplished by close (visual) examination of many practical routing results. However, this falls outside the scope of this thesis.
151
of
2. Determine for each , and select a node for which is minimal. The function measures the average distance between the unconnected subtrees in , denoted by , , and all nodes in . 3. Construct tree
by adding the path from the closest subtree in in to to the previous tree, i.e. set
nodes in
to ,
nodes in
4. If
edges in
and go to 2.
edges in
is .
Conceptually, the algorithm works as follows. We start with a set of unconnected pins. The idea is now to constructively connect pins to each other such that the number of unconnected pins is decreased. Since we connect two pins during each iteration, after a nite number of iterations all pins are connected. ADH distinguishes itself in the choice of which subtrees, each containing at least one pin, to connect and how they should be connected. The averagedistance node is the node which has the smallest average distance to all currently constructed subtrees. After such a node is computed, the closest subtree and the second-closest subtree to that node are connected via two paths originating from the average-distance node. As a consequence of this step, both subtrees are merged into a single subtree connecting all pins it contains. The average-distance node is computed again, and the previous steps are repeated until all subtrees are connected.
10 Non-pins
152
Routing
Two additional steps, identical to step 4 and 5 of the shortest-paths heuristic, can be applied to further improve the solution [121]. Formally, we can dene the average distance function by (7.11) where is the shortest path distance between node and the currently built subtree . Furthermore, is the iteration index as dened in the algorithm above, and is the total number of pins. A fast implementation of ADH is due to Chang and Lee [4]. Their main contribution is the identication of circumstances in which more than two subtrees can be joined together in a single iteration, hence reducing the total number of iterations. Nonetheless, the computational complexity of ADH is dominated by the evaluation of the average-distance function (7.11) during each iteration. It can be veried that ADH has computational complexity for planar graphs [121]. Note that this complexity is a function of the total (conned) routing graph size and not the number of pins. A signicant portion of this complexity can be attributed to the computation of shortest paths. Motivated by the urge to reduce computational complexity, we propose a modied version of ADH hereafter, which we call the average-distance-based heuristic (ADBH). Average-Distance-Based Heuristic (ADBH) The essential difference between the previously discussed ADH and the modied algorithm which we propose here, is the use of the Manhattan distance measure as used in
(7.12)
instead of the shortest-path distance measure used in (7.11). In addition, we use the algorithm to nd an actual shortest path between a source and a target node. By virtue of faster Manhattan distance approximation and the use of the algorithm to nd a shortest path, we can reduce the worst-case total computational complexity somewhat. This can be seen as follows. The time taken by the operation to nd the Manhattan distance between a node and a subtree is proportional to the number of nodes in this subtree. The maximum size of a subtree is of course never larger than the total number of nodes in the entire graph. Since all subtrees are disjunct, the total complexity for evaluating the distance from a given node to all subtrees is . Function is evaluated times for all nodes not in to all nodes in the subtrees. Furthermore, the addition of the best average-distance node and the (two) paths leading to that node takes in the worst case. This is done times. Consequently, the total computational complexity is . Note that and normally does not depend on . Because of this approximation it makes more sense to compare the results of ADBH with SPBH II instead of comparing it to the original SPH. As a nal remark, we note that the average distance of a node in only changes due to the change in distance to a merged (and extended) subtree . This fact can be exploited to yield a more efcient overall algorithm. However, this issue is not explored in this thesis.
153
minimal bounding box (MBB), shortest-paths heuristic (SPH), shortest-path-based heuristic I (SPBH I), shortest-path-based heuristic II (SPBH II), and average-distance-based heuristic (ADBH).
Since, worst-case performance of a heuristic can give an overly pessimistic indication of practical performance, and different heuristics perform differently on different problem instances, it is necessary to experimentally evaluate these heuristics on a set of representative problem instances. We rst dene the problem instances which we use to benchmark the routing heuristics. Then we evaluate these heuristics with respect to the following points:
solution cost, percentage deviation from the optimal solution cost, computation time.
From these values we can derive some implications with respect to the following issues: worst-case solution quality as implied by the error ratio versus practical solution quality; computational complexity versus practical performance;
154
Routing
set of difcult graphs, will also perform well on the easier graphs which are generated during the nal stages of placement optimization. Thus, the results should give a good indication of typical routing performance. Table 7.1 shows the benchmark set of global routing graph instances we have dened, along with optimal routing solutions. The numbers shown in the shaded area are best known Table 7.1: Information on the set of generated routing graph benchmark instances, stating the number of nodes, the number of edges, and the number of pins in graph . Also the optimal solution values are shown in the optimal solution column. The numbers in the shaded area are best known upper bounds (and thus those problems have, as yet, unknown optimal solutions).
name lin01 lin02 lin03 lin04 lin05 lin06 lin07 lin08 lin09 lin10 lin11 lin12 lin13 lin14 lin15 lin16 lin17 lin18 lin19
optimal solution 80 82 84 266 269 274 526 530 532 540 1460 1462 1466 1472 1484 3633 3641 3646 3662 4 6 8 6 9 14 6 10 12 20 10 12 16 22 34 12 20 25 41 503 557 926 1239 1703 1348 1885 2248 2752 4132 4280 5250 4609 5824 7145 6618 8405 9714 13268
name lin20 lin21 lin22 lin23 lin24 lin25 lin26 lin27 lin28 lin29 lin30 lin31 lin32 lin33 lin34 lin35 lin36 lin37
optimal solution 6709 6717 6726 6750 14734 14743 14749 14753 14798 35636 35644 35653 35665 35730 71521 71533 71546 71657 11 20 28 52 16 24 30 36 81 24 31 40 53 117 34 45 58 172 6673 9143 10519 17560 15076 17803 21757 20678 32584 23765 27684 33248 41444 58017 46244 51996 57849 102733
53 55 57 157 160 165 307 311 313 321 816 818 822 828 840 1981 1989 1994 2010
3675 3683 3692 3716 7998 8007 8013 8017 8062 19083 19091 19100 19112 19177 38282 38294 38307 38418
upper bounds at the time of writing, while the other numbers are optimal values. The problem instances are derived from placements with as few as 10 modules (lin01, lin02, lin03) to placements with 2560 modules (lin34, lin35, lin36, lin37). To give the reader a visual impression of such a problem instance, a visualization of lin23 is shown in Figure 7.14.
155
Figure 7.14: An optimal solution to the routing problem instance lin23 consisting of 3716 nodes, 6750 edges and 52 pins. The total wire length is 17560. 512Mbytes of RAM. All computation times are measured using the getrusage() system call. Table 7.2 shows the experimental results on the previously dened set of global routing graph instances, of several routing heuristics. It is clear from these results that MBB routing is fastest of all, but the routing estimations it provides are disastrous; not only is the deviation from optimal very large, it also varies from -6% deviation to as much as -80% deviation. Also, we can see directly that ADBH performs very poorly, which is quite surprising. Because both run-time performance and solution quality are extremely poor for ADBH, we disregard it in our further discussion. We can also see that routing algorithm SPBH I performs best. Not only does it produce the highest quality results which are not more than 3% away from the optimum (on average), but also its computation times are very modest. Figure 7.15 shows the solution of algorithm SPBH I to problem instance lin23. This result should be compared with the optimal solution shown in Figure 7.14. It should be noted that the shown routing solution with length 18341 deviates 4.45% from optimal. However, this is barely assessable by visual inspection. The shaded rectangles are modules; 320 in total. Furthermore, the thin black lines are unexplored edges while the thin grey lines are explored edges. We can see that the right side of the plot contains a vertical region which has been left unexplored by the search wave. Based on these results which suggest SPBH I as a most promising routing heuristic, a
156
name time 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 1.00e-4 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 1.00e-4 0.00e+0 1.00e-4 0.00e+0 0.00e+0 0.00e+0 8.11e-06 5.70 6.48e-02 2.81 6.16e-02 13.69 2.58e-02 80.51 1.26e+02 503 557 935 1267 1813 1393 1897 2649 3107 4773 4323 5314 4643 6971 7350 6688 9552 10414 14056 6968 9574 11337 19564 17655 19150 22480 22077 34582 24511 28940 34645 41606 61400 48368 52766 58853 105211 0.00 0.00 0.97 2.26 6.46 3.34 0.64 17.84 12.90 15.51 1.00 1.22 0.74 19.69 2.87 1.06 13.65 7.21 5.94 4.42 4.71 7.78 11.41 17.11 7.57 3.32 6.77 6.13 3.14 4.54 4.20 0.39 5.83 4.59 1.48 1.74 2.41 1.00e-4 2.00e-4 2.00e-4 5.00e-4 7.00e-4 6.00e-4 9.00e-4 1.10e-3 1.10e-3 1.40e-3 3.80e-3 4.10e-3 2.70e-3 4.00e-3 3.80e-3 1.08e-2 1.31e-2 1.26e-2 1.33e-2 1.90e-2 2.03e-2 2.07e-2 2.51e-2 5.93e-2 5.47e-2 5.84e-2 5.55e-2 5.76e-2 1.31e-1 1.41e-1 1.37e-1 1.44e-1 1.42e-1 3.16e-1 2.97e-1 3.04e-1 3.41e-1 503 557 932 1239 1770 1348 1897 2280 2785 4294 4335 5354 4827 6020 7345 6865 9382 10665 13777 6808 9430 10898 18341 15929 18824 22516 21387 33642 24835 28780 33293 41544 58173 47810 52766 57849 103122 0.00 0.00 0.65 0.00 3.93 0.00 0.64 1.42 1.20 3.92 1.29 1.98 4.73 3.37 2.80 3.73 11.62 9.79 3.84 2.02 3.14 3.60 4.45 5.66 5.73 3.49 3.43 3.25 4.50 3.96 0.14 0.24 0.27 3.39 1.48 0.00 0.38 1.00e-4 1.00e-4 2.00e-4 4.00e-4 5.00e-4 4.00e-4 1.00e-3 9.00e-4 1.00e-3 1.20e-3 3.20e-3 3.80e-3 2.80e-3 3.70e-3 3.80e-3 1.16e-2 1.36e-2 1.23e-2 1.26e-2 1.81e-2 1.81e-2 1.99e-2 2.65e-2 6.04e-2 5.05e-2 5.58e-2 5.28e-2 5.32e-2 1.29e-1 1.33e-1 1.32e-1 1.39e-1 1.36e-1 3.17e-1 2.95e-1 2.70e-1 2.99e-1 603 606 953 1371 1849 1579 1941 2347 3199 4463 4484 5587 5119 6517 9040 7221 9297 11765 15224 8100 10767 12081 20481 16694 21641 26353 24112 39125 27612 32506 37798 47818 65896 51373 61207 62596 119808 19.88 8.80 2.92 10.65 8.57 17.14 2.97 4.40 16.24 8.01 4.77 6.42 11.07 11.90 26.52 9.11 10.61 21.11 14.74 21.38 17.76 14.85 16.63 10.73 21.56 21.12 16.61 20.07 16.19 17.42 13.69 15.38 13.58 11.09 17.71 8.21 16.62 0.00e+0 2.00e-4 2.00e-4 2.00e-4 3.00e-4 5.00e-4 4.00e-4 5.00e-4 7.00e-4 9.00e-4 9.00e-4 8.00e-4 1.20e-3 1.60e-3 2.60e-3 1.50e-3 2.30e-3 3.60e-3 5.60e-3 3.00e-3 3.70e-3 4.40e-3 1.24e-2 7.60e-3 1.28e-2 1.21e-2 1.09e-2 4.01e-2 1.63e-2 1.81e-2 2.89e-2 3.47e-2 1.16e-1 3.44e-2 4.27e-2 5.45e-2 4.79e-1 503 606 1113 1780 1957 1476 2079 2711 3931 6202 6237 6379 6339 10385 12062 9250 12849 16497 26698 10062 14096 19339 42019 20017 34120 44279 44365 99705 46332 66750 68879 100909 169658 105095 123109 159935 355800 0.00 8.80 20.19 43.66 14.91 9.50 10.29 20.60 42.84 50.10 45.72 21.50 37.54 78.31 68.82 39.77 52.87 69.83 101.22 50.79 54.17 83.85 139.29 32.77 91.65 103.52 114.55 205.99 94.96 141.11 107.17 143.48 192.43 127.26 136.77 176.47 246.33 1.00e-4 4.00e-4 8.00e-4 1.00e-3 1.90e-3 3.40e-3 1.70e-3 4.30e-3 6.20e-3 1.53e-2 1.33e-2 1.72e-2 2.66e-2 5.85e-2 1.25e-1 5.61e-2 1.27e-1 2.22e-1 5.97e-1 9.16e-2 2.88e-1 5.87e-1 2.37e+0 4.69e-1 1.36e+0 2.24e+0 2.88e+0 1.56e+1 4.16e+0 7.28e+0 1.13e+1 2.16e+1 2.34e+2 3.79e+1 5.58e+1 1.60e+2 4.11e+3 cost time cost time cost time cost time
cost
MBB % opt
SPH % opt
SPBH I % opt
SPBH II % opt
ADBH % opt
lin01 lin02 lin03 lin04 lin05 lin06 lin07 lin08 lin09 lin10 lin11 lin12 lin13 lin14 lin15 lin16 lin17 lin18 lin19 lin20 lin21 lin22 lin23 lin24 lin25 lin26 lin27 lin28 lin29 lin30 lin31 lin32 lin33 lin34 lin35 lin36 lin37
475 515 560 975 1208 831 1328 1857 1695 2143 2730 3240 2741 3222 3330 4323 4526 4680 5581 4624 5360 5043 6668 9072 9428 10066 9406 10198 11904 14047 12984 14917 15975 21689 20189 19053 21420
-5.57 -7.54 -39.52 -21.31 -29.07 -38.35 -29.55 -17.39 -38.41 -48.14 -36.21 -38.29 -40.53 -44.68 -53.39 -34.68 -46.15 -51.82 -57.94 -30.71 -41.38 -52.06 -62.03 -39.82 -47.04 -53.73 -54.51 -68.70 -49.91 -49.26 -60.95 -64.01 -72.46 -53.10 -61.17 -67.06 -79.15
Table 7.2: Experimental evaluation results of several routing heuristics. All times are in CPU seconds of a Linux 2.4 operating system running on an Intel Pentium PIII 800MHz system. The CPU times are averaged over 100 runs. The 0.00e+0 times (assumed equal to zero) were too small to measure.
average
-45.56
Routing
few additional experimental investigations are conducted. We tested the following variations of SPBH I:
157
Figure 7.15: A near-optimal SPBH I solution to the routing problem instance lin23 consisting of 3716 nodes, 6750 edges and 52 pins. The total wire length is 18341 which deviates 4.45% from the optimal solution shown in Figure 7.14.
ISPBH I ZZ: iterated version of SPBH I but instead of starting with an arbitrary pin, we start with a shortest path between all different pair of pins times; ISPBH I Z: iterated version of SPBH I but instead of starting with an arbitrary pin, we start times with a different pin; ISPBH I Z BIAS: iterated version of SPBH I Z but whenever an arbitrary decision needs to be taken, this decision is biased towards extending an edge in the direction of the center of gravity of the net.
Table 7.3 shows the results of the experiments conducted with these algorithms. For the moment, ignore the ISPBH I Z BIAS results. Shortly, we will explain why. It is clear from the results in this table that ISPBH I ZZ produces overall the best results with 1.4% deviation from the optimum on average. However, the computation times of ISPBH I ZZ are quite large, increasing rapidly with larger nets. Therefore, the faster ISPBH I Z is more suitable for use in an iterative framework since the differences in routing quality are not that large. An average improvement over ISPBH I Z is obtained by ISPBH I Z BIAS without additional computational overhead, by exploiting biasing information where normally arbitrary decisions are made by the algorithms. This biasing technique can also be applied to ISPBH I ZZ
158
Routing
to improve the solution cost slightly, without increasing computation time. However, the computation time of ISPBH I ZZ is too large for the algorithm to be practical, anyway. We see that the solution quality of ISPBH I Z BIAS is comparable with ISPBH I ZZ while the computation time is two orders of magnitude lower. Table 7.3: Experimental routing results obtained with variations on algorithm SPBH I. All times are in CPU seconds of a Linux 2.4 operating system running on an Intel Pentium PIII 800MHz system. The CPU times are averaged over 10 runs.
name lin01 lin02 lin03 lin04 lin05 lin06 lin07 lin08 lin09 lin10 lin11 lin12 lin13 lin14 lin15 lin16 lin17 lin18 lin19 lin20 lin21 lin22 lin23 lin24 lin25 lin26 lin27 lin28 lin29 lin30 lin31 lin32 lin33 lin34 lin35 lin36 lin37 average SPBH I cost 503 557 932 1239 1770 1348 1897 2280 2785 4294 4335 5354 4827 6020 7345 6865 9382 10665 13777 6808 9430 10898 18341 15929 18824 22516 21387 33642 24835 28780 33293 41544 58173 47810 52766 57849 103122 cost 503 557 926 1239 1703 1348 1885 2252 2752 4248 4289 5301 4631 5981 7256 6696 8550 10044 13562 6677 9372 10726 17983 15518 18615 22358 21427 33583 24339 28463 33516 41541 58194 46886 51996 58386 102781 ISPBH I ZZ % opt 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.18 0.00 2.81 0.21 0.97 0.48 2.70 1.55 1.18 1.73 3.40 2.22 0.06 2.50 1.97 2.41 2.93 4.56 2.76 3.62 3.07 2.42 2.81 0.81 0.23 0.31 1.39 0.00 0.93 0.05 1.36 time 1.00e-3 4.00e-3 8.00e-3 1.20e-2 3.60e-2 6.30e-2 2.40e-2 7.80e-2 1.17e-1 4.01e-1 3.48e-1 5.81e-1 6.85e-1 1.55e+0 3.62e+0 1.53e+0 4.75e+0 7.61e+0 2.26e+1 2.15e+0 7.73e+0 1.51e+1 6.66e+1 1.49e+1 3.11e+1 5.43e+1 7.51e+1 3.91e+2 7.50e+1 1.38e+2 2.32e+2 4.38e+2 1.90e+3 3.57e+2 5.97e+2 1.00e+3 9.86e+3 4.13e+02 cost 503 557 926 1267 1709 1348 1897 2252 2785 4256 4289 5314 4631 6056 7319 6688 8718 10133 13572 6887 9372 10758 18079 15856 18803 22430 21569 33655 24339 28517 33668 41516 58454 47300 52134 58531 102710 ISPBH I Z % opt 0.00 0.00 0.00 2.26 0.35 0.00 0.64 0.18 1.20 3.00 0.21 1.22 0.48 3.98 2.44 1.06 3.72 4.31 2.29 3.21 2.50 2.27 2.96 5.17 5.62 3.09 4.31 3.29 2.42 3.01 1.26 0.17 0.75 2.28 0.27 1.18 -0.02 1.92 time 1.00e-3 0.00e+0 1.00e-3 2.00e-3 5.00e-3 5.00e-3 5.00e-3 9.00e-3 1.00e-2 2.00e-2 3.50e-2 4.70e-2 4.50e-2 7.50e-2 1.13e-1 1.39e-1 2.40e-1 3.13e-1 5.54e-1 2.26e-1 4.13e-1 6.03e-1 1.33e+0 1.04e+0 1.38e+0 1.91e+0 1.92e+0 4.26e+0 3.12e+0 4.29e+0 5.67e+0 7.89e+0 1.67e+1 1.16e+1 1.43e+1 1.71e+1 5.57e+1 4.08e+00 cost 503 557 926 1239 1703 1348 1885 2248 2785 4204 4287 5314 4631 6049 7295 6664 8586 10184 13721 6717 9376 10675 18029 15519 18426 22218 21240 33663 24329 28601 33248 41444 58017 46244 52651 57960 102733 ISPBH I Z BIAS % opt 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.20 1.74 0.16 1.22 0.48 3.86 2.10 0.70 2.15 4.84 3.41 0.66 2.55 1.48 2.67 2.94 3.50 2.12 2.72 3.31 2.37 3.31 0.00 0.00 0.00 0.00 1.26 0.19 0.00 1.38 time 0.00e+0 1.00e-3 1.00e-3 2.00e-3 5.00e-3 5.00e-3 5.00e-3 9.00e-3 1.10e-2 2.00e-2 3.40e-2 4.40e-2 4.40e-2 7.80e-2 1.06e-1 1.41e-1 2.51e-1 3.13e-1 5.53e-1 2.22e-1 3.99e-1 6.05e-1 1.30e+0 1.05e+0 1.35e+0 1.89e+0 2.13e+0 4.83e+0 3.40e+0 4.61e+0 5.97e+0 8.30e+0 1.72e+1 1.18e+1 1.48e+1 1.74e+1 5.62e+1 4.19e+00
Summarizing, SPBH I and ISPBH I Z BIAS are the most promising candidates for routing in an iterative optimization framework.12 It depends, among others, on the typical size of a net whether or not it is worth trading off computation time with solution quality. For comparison purposes it is interesting to contrast the heuristic graph SMT results with optimal rectilinear SMT (RSMT) and Euclidean SMT (ESMT) results. Recall that the RSMT and ESMT13 solutions ignore modules in the plane. Consequently, wires can run over modules which per denition is undesirable. However, the results do give a good indication of how much solution quality we lose by imposing a non-over-the-cell-routing constraint. The optimal results have been obtained using Geosteiner 3.0 written by Warme et al. [125] which is considered a state-of-the-art tool for computing RSMTs and ESMTs. Table 7.4 summarizes the outcomes of the experiments which are performed on the same hardware platform as the other routing experiments. The average improvement of RSMT solutions over near12 In principle, it is also possible to apply biasing techniques to SPBH I, thereby improving solution quality while not enlarging computation time. 13 The ESMT is similar to the RSMT except for the fact that edges are not restricted to horizontal and vertical directions.
159
optimal heuristic ISPBH I Z BIAS solutions is 6.6%. This means that RSMT lengths are about 5% shorter than graph SMT lengths. Of course, ESMT improves on these results. Note that the CPU times of RSMT and ESMT are orders of magnitude smaller than the CPU of the ISPBH I Z BIAS heuristic and comparable with the CPU times of the SPBH I heuristic. Table 7.4: Experimental routing results of the RSMT and ESMT problems obtained using the Geosteiner 3.0 tool. All times are in CPU seconds of a Linux 2.4 operating system running on an Intel Pentium PIII 800MHz system. The columns headed by % dev. represent the deviation of the solution cost with respect to the ISPBH I Z BIAS solution and not related to the optimal solution.
name lin01 lin02 lin03 lin04 lin05 lin06 lin07 lin08 lin09 lin10 lin11 lin12 lin13 lin14 lin15 lin16 lin17 lin18 lin19 lin20 lin21 lin22 lin23 lin24 lin25 lin26 lin27 lin28 lin29 lin30 lin31 lin32 lin33 lin34 lin35 lin36 lin37 average ISPBH I Z BIAS cost time 503 557 926 1239 1703 1348 1885 2248 2785 4204 4287 5314 4631 6049 7295 6664 8586 10184 13721 6717 9376 10675 18029 15519 18426 22218 21240 33663 24329 28601 33248 41444 58017 46244 52651 57960 102733 0.00e+0 1.00e-3 1.00e-3 2.00e-3 5.00e-3 5.00e-3 5.00e-3 9.00e-3 1.10e-2 2.00e-2 3.40e-2 4.40e-2 4.40e-2 7.80e-2 1.06e-1 1.41e-1 2.51e-1 3.13e-1 5.53e-1 2.22e-1 3.99e-1 6.05e-1 1.30e+0 1.05e+0 1.35e+0 1.89e+0 2.13e+0 4.83e+0 3.40e+0 4.61e+0 5.97e+0 8.30e+0 1.72e+1 1.18e+1 1.48e+1 1.74e+1 5.62e+1 4.19e+00 cost 501 557 831 1089 1550 1286 1852 2072 2575 3789 4108 4986 4282 5637 6794 6374 8102 9155 12664 6512 8611 10034 16864 14682 16730 21214 19854 31158 22894 26796 30430 38351 53874 42953 48865 53390 95217 RSMT % dev. -0.40 0.00 -10.26 -12.11 -8.98 -4.60 -1.75 -7.83 -7.54 -9.87 -4.18 -6.17 -7.54 -6.81 -6.87 -4.35 -5.64 -10.10 -7.70 -3.05 -8.16 -6.00 -6.46 -5.39 -9.20 -4.52 -6.53 -7.44 -5.90 -6.31 -8.48 -7.46 -7.14 -7.12 -7.19 -7.88 -7.32 -6.60 time 0.00 0.00 0.00 0.01 0.01 0.01 0.00 0.00 0.00 0.02 0.00 0.01 0.00 0.00 0.01 0.00 0.01 0.01 0.03 0.01 0.00 0.01 0.12 0.01 0.00 0.02 0.07 0.62 0.01 0.01 0.02 0.23 0.80 0.02 0.03 0.14 1.44 9.95e-02 cost 430 466 711 1003 1421 1150 1618 1861 2283 3230 3475 4460 3804 4719 5996 5542 7117 7929 11271 5669 7602 8797 14720 12812 14603 18494 17584 27127 20161 22989 26690 33186 46723 38553 41635 47118 82758 ESMT % dev. -14.51 -16.34 -23.22 -19.05 -16.56 -14.69 -14.16 -17.22 -18.03 -23.17 -18.94 -16.07 -17.86 -21.99 -17.81 -16.84 -17.11 -22.14 -17.86 -15.60 -18.92 -17.59 -18.35 -17.44 -20.75 -16.76 -17.21 -19.42 -17.13 -19.62 -19.72 -19.93 -19.47 -16.63 -20.92 -18.71 -19.44 -18.30 time 0.00 0.00 0.01 0.03 0.02 0.20 0.02 0.03 0.06 0.16 0.05 0.08 0.10 0.11 0.43 0.03 0.19 0.39 1.70 0.04 0.11 0.26 1.57 0.19 0.16 0.46 1.90 4.91 0.17 0.44 1.02 1.41 7.63 0.95 0.85 2.12 7.66 9.58e-01
160
Routing
ders of magnitude faster than ISPBH I Z BIAS it is certainly more suitable during the initial phase of simulated annealing optimization. Another issue worth investigating with respect to improvement of run-time performance, is the idea of multiple wave expansion as elaborated in [123] but explored in a somewhat different context. We already pointed out that the Geosteiner 3.0 tool produces optimal solutions much faster than a heuristic produces sub-optimal solutions. This seems paradoxal but it should be noted that the Geosteiner 3.0 is strongly optimized while our heuristic code is not. However, Geosteiner 3.0 cannot compute optimal Steiner minimal trees in graphs which is an important requirement in our framework. As yet, no previous published results are known on fast heuristics for nding near-optimal Steiner minimal trees in graphs derived from actual module placements. Moreover, little was known about their absolute performance in relation with optimal solutions. Our work on routing has lled this gap. Last but not least, a rigorous optimization of the heuristic code should signicantly improve run-time performance, resulting in a very fast global routing heuristic that yields nearoptimal results.
161
3. Incrementally update the total wire-length by pre-subtracting the length of the net before the perturbation, and adding the length of the re-computed routing of that net after the perturbation. Unfortunately, a nasty problem arises which is not easily discovered. The underlying reason of this problem is given next. In all of our global routing heuristics we use a priority queue to store candidate nodes for expansion. One property of priority queues that has been left untouched up to now, is the action that has to be taken when multiple elements with the same key exist in the queue. We assume, as is the default in these cases, that when the priority queue has to decide which element to choose from a set of elements with equal keys, it will make an arbitrary choice among these elements. Generally, there is no reason to deviate from this (usually) implicit assumption. However, in the present context we can easily sketch a scenario in which an arbitrary choice is unwanted. Figure 7.16(a) shows a part of a global routing graph in which the three pins , , and have to be routed. Pin is the start pin (Figure 7.16(a)). After a few steps, we arrive at node which is explored and put in the priority queue. When the algorithm extracts node from the queue for expansion, it will nd node , relax it and put it in the queue, keyed by the distance from pin . Node is found subsequently, relaxed, and put in the queue, keyed by the distance from pin . Since edge has the same length as edge , nodes and reside in the queue with the same key value, say . When is the smallest key in the queue, and an extract min queue operation is issued, it is not clear a priori whether element or element will be returned. Since we have implemented the priority queue with a splay tree (see Chapter 5), the priority queue is fully deterministic. Therefore, even though we cannot predict which of the elements and is going to be extracted rst, when the priority is built up using the same sequence of elements and operations, the choice between and is arbitrary but static. Figure 7.16(b) shows that in the case where is extracted rst from the priority queue, node will become the parent node of node because node is explored rst via node . This will eventually result in the routing of the three-pin net as shown in Figure 7.16(c), because when pin is found after expanding node , the backtrack pointers (shown as arrows in the gure) go via nodes , , , in that order. Pin is then connected to the routing tree using the traversed edges. On the other hand, when the priority queue is built up using a different sequence of elements and operations, for example due to the presence of a module , it is possible that instead of element , element is extracted rst (even if both distance values relative to are equal again). This eventually results in the routing solution as shown in Figure 7.16(d). Due to the importance of this observation, we formulate it in the following theorem. Theorem 9 The exact topology of a balanced search tree, such as the splay tree (see Chapter 5), depends on both the order and the total set of performed tree operations and tree elements. Note that also the addition of a single element, directly followed by the deletion of that element, results in a different sequence and thus a possibly different topology of the tree structure. The scenario in which a node is added to an already existing routing tree is easily conceivable. For instance, when a module changes position, its associated escape lines can induce new edges and nodes in the routing graph. It is important to note that this might occur without re-routing the routing tree, simply because of the fact that we know in advance that
162
Routing
(a)
(b)
(c)
(d)
Figure 7.16: An example scenario which demonstrates that even in the case of a deterministic priority queue which is used in our routing heuristic, the routing result depends on the environment of the modules that are of direct interest for the net to be routed. In this example a difference in extraction order of nodes and (keyed with the same distance value) yields two routing results with different lengths (thick lines); (c) versus (d). the routing tree should not change (the routing tree does not run through the affected region, nor is it directly connected to any of the affected modules). Essentially, the arbitrary breaking of ties (for elements with the same key in the queue), causes this unwanted behavior of the routing algorithm. Although, the tie-breaking choice does not have any obvious preference at that specic moment, it is likely to affect the nal outcome of the routing solution (and consequently, the routing length). The choices we have to break a tie are:
choose the node, independent of the contents and structure of the balanced search tree;
163
choose the node, dependent of the contents and structure of the balanced search tree.
Clearly, we must select the rst choice. A practical implementation could be based on breaking a tie by choosing the node closest to the center of gravity of all pins in the net. For clarity, we show next what will happen when a naive tie-breaking approach is adopted. It is clear from the previous discussion that the routing result of a net depends on the environment around the region dened by the modules connected to that net. Therefore, incremental re-routing is badly affected due to the fact that the total wire-length obtained by incremental means can deviate without bound from the real total wire-length. This can be seen from the algorithm shown in Figure 7.17. It is clear that essentially the operations
1 while do 2 perturb 3 4 compute new 5 6 if 7 then 8 undo perturb 9 10 compute new 11 12 13 od
Figure 7.17: A simplied simulated-annealing-based algorithm which we use to demonstrate the effect of environment-dependent re-routing results. stands for wirelength and , are integers in the range , . Furthermore, we only consider a single net with length . on lines 3 to 5 are equal to the operations performed on lines 9 to 11. Using an equivalent representation for lines 3 to 5, in the form of
it can be easily seen that we have essentially , which can be written as . Consequently, considering a single generation-rejection scenario, always evaluates to true, the algorithm of Figure 7.17 can be simplied to i.e. the algorithm shown in Figure 7.18. From this simple algorithm we can directly see that in
1 while 2 3 4 od
do
Figure 7.18: Simplied generation-rejection algorithm.
should hold. Consequently, the case of an ideal generation-rejection operation, in effect, nothing happens to when the loop is iterated. However, from the previous
164
Routing
discussion we know that the routing heuristic can cause . It is evident that can start drifting uncontrollably. Hence, this unwanted effect renders the optimization algorithm useless when straightforward incremental routing techniques are applied. The previously discussed solution to resolve environment-dependent routing results is to use some sort of biasing technique which will decide in a predictable and balanced-treestructure-independent manner in case of ties. A biasing technique which will give good routing results is preferred of course. However, it is difcult to measure the quality of such a technique in a general context. An implementation of this technique could be as follows.
1. Extract all equal-keyed nodes from the queue and put them in a separate data structure . Then compute a unique center of gravity which will be the reference point for the net in the current topology. 2. Choose the node from with smallest - or -distance from . In case of ties, choose the node with smallest -coordinate rst. If there is still a tie, choose the node with smallest -coordinate to resolve all ties.
Actually, is not strictly necessary for an efcient implementation. We can simply process the extracted nodes sequentially (in any order) and decide to adjust the parent node of an explored node with distance equal to the current exploration distance. The nal decision is based on the criteria mentioned in step 2 of the previous algorithm. The validity of the sequential approach is a direct consequence of the associativity property of the mathematical -operator. This method can be simplied by discarding the center of gravity information and assigning xed priorities to each of the four edges departing from a node; the edge with higher priority will always have preference. Since there is no good reason to assume that the more sophisticated approach will perform better in practice in terms of routing quality, we choose for the simplest approach which is also the fastest.
Enumerate all nets that run along any of the sides of the moved modules. Since we accumulated all global routing information and assigned this information to appropriate module boundaries in a previous iteration, it is a straightforward task to perform the enumeration without incurring additional computational complexity.
165
Take only into account the modules at the perimeter of the affected region, which is the region containing the moved modules. If a routing segment of a net runs into this affected region, then that segment needs re-routing. Since each affected net that is not connected to any of the moved modules must cross this perimeter, it is sufcient to process all perimeter modules.
A clear drawback of the rst method is that all moved modules must be processed in order to nd all affected nets. When the number of moved modules becomes larger, it will become more advantageous to consider solely the perimeter modules, since the number of perimeter modules will grow roughly proportionally with the square root of the number of moved modules. Furthermore, the second method immediately identies the exact boundary locations of the routing segments that penetrate the affected region. The usefulness of knowing these boundary locations will be made clear shortly. As a result of the previous discussion, the second method is preferred over the rst method.
Full Re-routing of Indirectly Affected Nets With respect to the nets that cross the affected region dened by the moved modules, the simplest way to recompute the routing of a net is to compute the entire routing again. Using this approach, a high-quality routing is generally maintained at the expense of higher computational complexity. In case of large nets with many pins, this approach may burden the overall algorithm to a large extent, since most of the pins and thus the largest part of the routing segments lie in the unaffected region.
Partial Re-routing of Indirectly Affected Nets A method to reduce the computational overhead of full re-routing of affected nets is through the use of partial re-routing. Essentially, only that part of the routing is re-computed that lies in the affected region. Besides a benecial reduction in computational complexity for larger nets, there is an additional advantage with respect to the quality of a net. If the boundary crossings of the routing segments of a specic net are considered as virtual pins, we can actually obtain a gain in routing quality when we connect the virtual pins in a Steinerminimal-tree-like manner. However, when naively viewing all virtual boundary pins induced by a net as pins of a single subnet which needs to be routed, a lurking danger is the introduction of loops in the interconnect of the total net. Clearly, measures should be taken to avoid the occurrence of loops, since this is unwanted by denition.14 A way to solve this problem is by keeping track of subsets of boundary pins that are disconnected due to the removal of routing segments in the affected region. Consequently, partial re-routing of affected nets can be favourable over full re-routing, especially if the partial portion is relatively small compared to the total routing of the net.
14 Under some circumstances it might actually be desirable to introduce loops, motivated by electromagnetic considerations, but this topic is outside the scope of this thesis.
166
Routing
while do
foreach do
od
if random then
od
adjust temperature
Figure 7.19: A simplied simulated annealing algorithm with integrated placement and global routing optimization. With the previous discussion of the construction of the global routing graph and the extended global routing graph , the above algorithm speaks for itself. Based on former experimental results with respect to the routing heuristics, we have chosen for the shortestpaths-based heuristic I (SPBH I) which gives near-optimal routing results quickly. This
167
heuristic is used in conjunction with a non-incremental placement computation algorithm. Our main purpose is to show the impact of routing quality on placement quality when both are weighted equally, i.e. , in an optimization environment. We have not put any effort into minimizing run-time performance other than the most obvious implementation choices.
168
Routing
Table 7.5: Placement optimization results on a set of problem instances. All values are best of three runs. The parameters and denote the chip area weight and the total wire length weight, respectively (see (4.3)).
20 40 80 160 no routing 320 49 (ami49) CPU time [s] 14.18 161.3 1537 5687 20996 688.5 83.9 386.9 1373 5616 20685 999.3 1253.9 8043.7 30174 179990 1332248 68206 nal CA [mm ] 0.210 0.544 1.13 2.19 3.97 36.48 0.223 0.565 1.20 2.31 4.22 38.51 0.212 0.573 1.240 2.386 4.309 39.06 slack space [%] 5.35 3.52 2.69 4.13 5.89 2.85 10.7 7.27 8.64 8.82 11.37 7.96 6.54 8.45 11.35 11.84 13.22 9.26 3.195 9.636 18.759 44.953 118.435 713.98 3.567 10.141 20.264 50.565 129.154 661.87 5.248 15.960 39.211 99.641 264.120 796.18 4.892 14.440 32.872 88.242 234.826 745.75 MBB nal WL [mm] SPBH I
MBB
SPBH I
(avoiding obstacles). To substantiate this statement, Figure 7.20 shows the ratio of the total wire length obtained by SPBH I and MBB, as a function of the number of blocks (for three independent runs per value of ). It is easy to see that the correlation between SPBH I
correlation between SPBH _I and MBB global routing 2.3
2.2
2.1
1.9
1.8
1.7
1.6
1.5
200
250
300
350
Figure 7.20: Ratio of total wire length computed by SPBH I and MBB, as a function of (size of randomly generated benchmark). and MBB is heavily problem-size dependent. In general we may conclude that MBB routing is a bad total wire length predictor. Even stronger, MBB routing can signicantly decrease placement quality. Furthermore, for the randomly generated problem instances, a clear trend towards a xed ratio can be observed as grows larger. Despite the existence of a strong
169
correlation between the wire-length estimation obtained by SPBH I and MBB, it is merely a statistical measure which clearly does not guarantee that a decrease in SPBH I solution always corresponds to a decrease in MBB solution. Therefore, the practical usefulness of MBB routing is highly arguable.
7.9.3 Conclusions
Summarizing, we can conclude the following.
Our implementation of the simulated annealing optimization algorithm produces excellent packings, cf. [73, 69, 47]. Note that we did not optimize for speed, for instance by applying faster sequence pair algorithms [81, 47]. Coarse MBB routing does not correlate well with more accurate routing schemes such as SPBH I routing. Therefore it is not wise to apply MBB routing as a standard routing method to evaluate the quality of a routing-aware placement tool15 . However, we do observe quite a strong correlation between SPBH I and MBB among several runs of the same problem instance. In other words, a xed ratio can be computed but unfortunately this will give a distorted notion because it is not guaranteed to hold for every nal solution. The accuracy of global routing estimation signicantly impacts the quality of a block placement, since a substantial decrease of about 6.3% to 16.2% in wire length can be observed for SPBH I-based optimization while the chip area increases at most 3.3%. Accurate global routing, as compared to MBB routing, incurs a large penalty on the run-time performance of the optimization framework. The main culprits for this are the explicit construction of a global routing graph which has to be updated for each net, and the complexity of the accurate global routing algorithm itself.
Although the proposed optimization framework works well on pure block placement instances, it does not necessarily imply good behavior when additional constraints such as wire length are introduced. However, it is unlikely that MBB-based routing would render the optimization convergence behavior radically different from SPBH I-based routing. It should also be noted that for problem instances in which blocks have a high amount of pins, the routing complexity starts dominating the behavior of the optimization tool. As a consequence, for accurate routing to be practical for large problem instances, the routing complexity should be minimized substantially. This could, for instance, be accomplished by employing incremental techniques in conjunction with thorough optimization of the source code.
170
Routing
A very important requirement for enabling incremental computation is that all data structures are fully dynamic. Of course, much effort is needed to implement these concepts properly. It is even more difcult to implement these ideas with high run-time performance in mind. Since practical run times of in-loop operations are very important in a iterationintensive environment such as simulated annealing, optimizing the implementation should be considered, too.
Chapter 8
The aforementioned phenomena are discussed in detail and their role in the context of mixedsignal layout generation is made clear. In order to minimize the detrimental effects of these phenomena, accurate models are required. However, due to the iterative nature of our stochastic optimization engine, the models must have low associated computational complexity. We observe that in general very little effort has been dedicated to performance-driven optimization of layout in a pre-detailed-routing phase. We claim that performance issues should be taken into account as soon as possible in the optimization phase, preferably during placement and global routing, in order to obtain high-quality layouts. This claim is clearly supported by the approach taken in this thesis. Substrate coupling is a crosstalk phenomenon which has not been considered much in the context of layout generation. Therefore, investigations are performed to gain more insight on this topic in connection with integrated placement and routing. A novel method is proposed which takes substrate coupling into account without increasing computational complexity. Experimental results demonstrate the practical feasibility of the method. Furthermore, we show that the approach can be easily mapped into an incremental framework.
172
A very sophisticated model adds too much overhead to the overall computational complexity of the optimization framework, rendering the approach impractical. A coarse inaccurate model can negatively impact performance of the optimization engine and cause convergence problems in the worst case.
A trade-off between accuracy and efciency is in general unavoidable, but it is important to keep in mind that the model should produce a consistent estimation of reality. In other words, it is better to have a reasonably constant over-estimation of 30% than an apparently more accurate but uctuating estimation accuracy between -10% and 10%.
8.3 Self-Parasitics
Inherent physical properties of the materials which form a layout, induce parasitic phenomena which are not adequately modeled by many automated layout generation systems. The selfparasitics which consist of resistance, capacitance and inductance of a wire, form a separate
8.3 Self-Parasitics
173
class of unwanted effects based on the observation that the value of a self-parasitic solely depends on the geometrical properties of a single wire, independent of neighboring objects.
Figure 8.1: The sources of self-parasitics for a piece of interconnect. capacitance depends on the thickness , width , height and length of the piece of wire. Furthermore, a so-called fringing capacitance exists which depends on the same parameters but with a different weighting. Of course the actual type of material of which the piece of interconnect is made plays a role, too. Besides the capacitance, there is also a series inductance and a series resistance associated with every piece of interconnect. Depending on the type of signal carried through the wire, the material and geometry of the wire, either one of them might be dominant over the other. To give the reader an impression fF/ m , of some typical values of parasitic elements for a 0.5 m CMOS process: fF/ m ( m, m, m) for a metal1-metal2 scenario, and m for metal1, and m for metal2. Normally, higher metal layers have smaller sheet resistances.
174
8.4 Crosstalk
Crosstalk is the net effect of undesired signal propagation via parasitic coupling between objects in the layout. An effective remedy to lessen crosstalk is spatial separation of the objects that are subject to crosstalk. However, this is not a trivial problem to solve in most practical cases. In this thesis we discuss two different types of crosstalk sources: crosstalk due to substrate coupling and crosstalk due to parasitic coupling capacitance. A third source of crosstalk is magnetic coupling modeled with parasitic mutual inductance. This last issue lies outside of the scope of this thesis. However, we note that magnetic coupling effects can also be incorporated into our framework with some effort. By improving our understanding of the mechanisms of crosstalk and their effect on performance, we can nd better means to reduce detrimental crosstalk effects in the proposed integrated placement and routing framework.
8.4 Crosstalk
175
1 m
p-type
0.1 cm 10 m p-type
10-15 cm
400 m
p- bulk
7-15 cm
400 m
p+ bulk
10 mcm
(a)
(b)
Figure 8.2: Two fundamentally different type of substrates: (a) a high-resistivity substrate, and (b) a low resistivity-substrate. A simple model for a high-ohmic substrate, which was semi-empircally determined by Joardar [130], is shown in Figure 8.3. This model ts perfectly in our stochastic optimization framework, since it can be evaluated quickly. It should be noted that this model holds for guarded modules, but application to unguarded modules is justied [131] if we only want to minimize the inuence of substrate coupling. Nodes A and B in the circuit are connected node A
substrate
node B
Figure 8.3: A simple substrate model. to specic points in the integrated circuit, for instance the drains of two separate MOS transistors. In case of a bulk contact, the capacitor should be replaced by a short circuit. The resistances , , and strongly depend on process parameters and the geometry of layout modules. This information can be easily stored in a parameterized manner, since the layout module shapes are known in advance and very regular. Resistance is of most interest to us since it depends on the actual module placement. A closed-form expression for is (8.1) where is the effective lateral coupling length between two coupled objects, is the spacing between these objects, and , , are constants for a given process. The form of the
176
equation used to model is physically based, and obtained by solving the Laplace equations for two circular substrate contacts [131]. The slightly complicated form is because three-dimensional effects are included. Furthermore, since no simple expression exists for rectangular geometries, the one available for circuit contacts was used as an approximation. In the remainder of this chapter resistances and junction capacitances and will be ignored for simplicity, but without loss of generality.
object 5
layer 3
object 2
object 3
object 4
layer 2
object 1
layer 1
Figure 8.4: A simplied scenario which shows all parasitic capacitances from a piece of conducting material to its environment. depends on the distance between the objects on the same layer and the longest common length of the parallel-running parts of the lateral objects. More information on the values of these capacitances can be found in specic technology les. It is also possible to derive reasonably accurate closed-form expressions for many important parasitic phenomena in connection with wiring [126].
177
turn, can for instance lead to a reduced spurious-free dynamic range in the case of digital-toanalog converters. It is well-known that at least the systematic errors are strongly correlated with the location of the geometrical objects in a layout [80]. Therefore, it is important to take these effects into account in the layout phase so that the detrimental effects of process variations can be reduced as much as possible. Proper matching of the transistors of a differential pair is a well-known issue in analog layout design. In general we can speak of matched circuit or layout modules in which a module can consist of a single transistor, but can also be a passive element, or even a small subcircuit. Furthermore, it is important to note that matching constraints are essentially equivalent to relative placement constraints. These type of constraints can be taken into account by an efcient placement representation such as the sequence pair. However, the proposed approach by Balasa and Lampaert [45] is not efcient in terms of computational complexity of a single constrained placement evaluation. Furthermore, their approach induces a more irregular cost landscape and, therefore, worse convergence of the simulated annealing optimization algorithm. An approach based on the constrained placement work of Tang and Wong [47] is likely to offer better results. Finally, we note that although virtually all analog layout matching efforts have been focused on symmetric placement, at least as important in this context is symmetric routing. The latter has, to the best of our knowledge, never been explored in depth in the context of layout generation. The term symmetric is in our notion not restricted to geometric symmetry. Albeit sufcient, we argue that it is not necessary, since symmetric signals through the matched interconnect is the ultimate goal. In this thesis we do not elaborate further on the issue of process variations in connection with mixed-signal layout generation.
178
needed. Therefore, we have to dene exactly how a module is composed. The essentially 1-dimensional model of Joardar in the form of (8.1) is then generalized to two dimensions. In the context of sequence-pair-based block placement, we propose a novel method to handle the slack space that exists in most placements such that the impact of substrate coupling is minimized. The algorithm for accomplishing this is based on expansion of the core module and shifting the core module in the expanded module space so that the total impact of substrate coupling is minimized. More explicitly, in terms of the simple substrate coupling model of Figure 8.3, the task is to reduce to minimize the coupling, of which the actual impact is evaluated by means of a priori obtained sensitivity values. Note that the overall optimization procedure does not imply that placements with a large amount of slack space are seen as bad a priori. On the contrary, introducing additional slack space might reduce the overall impact of substrate coupling. With a properly chosen balance between chip area, total wire length, and substrate coupling impact, the simulated annealing algorithm stochastically searches for a placement which adheres to the given cost function (see (4.3)). Experimental results show the effectiveness and efciency of the approach.
core module
)
enclosing module
: the lower left coordinate of the enclosing module, : the width and height of the enclosing module, : and offset of the core modules lower left corner,
179
: width and height of the core module, : top, right, bottom, and left routing space width.
If and then we call the enclosing module tight, otherwise the enclosing module is loose and we have expansion space. Hereafter, routing issues are disregarded (at least the details); we included them here for completeness.
distance
Figure 8.6: Denition of skew and distance between two placed modules and . metrical method to compute the effective coupling in cases where modules are skewed. In addition, it is made plausible that the original expression for should be modied slightly in order to incorporate the proximity effects of modules with different dimensions. These notions are shown in Figure 8.7. It is clear that the renement of substrate coupling resistance in a 2-dimensional setting is based on geometrical arguments. For the case shown in Figure 8.7(a), two additional terms should be added to the computation of , giving it the following general shape when two modules are not fully skewed (Figures 8.7(a) and (b)):
asin
asin
(8.2)
where is an additional constant which is used for tting. In the case of a fully skewed placement of two modules and , the lateral parallel coupling between and in the original sense of (8.1) has vanished. Instead, the following formula which is derived from Figure 8.7(c) holds:
asin
.
(8.3)
180
(a) A modication of the original expression for coupling resistance takes into account the fringing effect.
(b) When modules are partially skewed, fringing effects arise from two adjacent sides of the modules.
(c) In the case where modules are fully skewed, we only have fringing effects and no lateral parallel effect.
Figure 8.7: The rened substrate coupling scenerios which take into account the fringing effects for proper estimation of the substrate coupling resistance in a 2-dimensional setting. It is a straightforward task to express (8.2) and (8.3) in terms of the geometrical and spatial properties of the basic modules. Finally, the amount of substrate coupling between . modules and is inversely proportional to resistance and denoted by
181
us to the ultimate goal of minimizing the impact of substrate coupling. The impact (on a given performance measure) of substrate coupling from module to module is dened by1
(8.4)
where is the substrate coupling sensitivity of performance function dened on module . The sensitivities can, for instance, be obtained from circuit simulation, a priori. The noisiness of module depends on both amplitude and time-derivative of a predened electrical property. Note that module is assumed to be xed in location, while the optimal position of module is to be determined. The problem of minimizing the impact of substrate coupling can be stated as follows. Problem: Substrate Coupling Impact Minimization Problem Instance: Solutions: A placement of modules associated with a sequence pair with chip area dimensions . All possible non-overlapping absolute placements of the modules without violating the relative relationships dictated by sequence pair . , subject to
Minimize:
Note that the problem can be seen as a force-balanced constrained mechanical system with given initial conditions, but strongly nonlinear relationships between the components. Clearly it is too costly to solve this problem to optimality in the context of our stochastic optimization framework. Therefore, we simplify the problem in three respects. 1. We introduce a rectangular window around every module which limits the number of surrounding modules that affect the module to be shifted to an optimal location. This is a reasonable limitation since in practice the modules that lie further away will be shielded by closer modules. 2. We accept a sub-optimal solution due to the procedure of selecting the modules to be processed sequentially in the order of decreasing amount of expansion space. Also this is an acceptable limitation since subsequent modules can never be shifted more than the maximum allowable thus lessening the effect on previously shifted modules. 3. Within the constrained minimum- location problem for a single module , any locally optimal solution is accepted. The underlying reason is that nding the global solution of this nonlinear function minimization problem of two variables is computationally expensive, while any additional gain might not be much.
1 Note
that
, in general, but
182
2. Extract the module with largest expansion space from the data structure and enumerate all neighbors of that fall within the range window. Solve the (two-dimensional) function minimization problem. Let the solution be . Set the absolute position of module to .
3. If
and go to step 2.
Figure 8.8: The substrate coupling impact minimization algorithm. the sum of enumerating and expanding all modules, and putting them in sorted order in a data structure. This can be performed in , on average. Step 2 consists of extracting an unprocessed module with largest expansion space from the data structure and enumerating the neighbors of that module. Since the modules are stored in sorted order, the extraction step takes . The computational effort to enumerate the neighbors of a module depends on the size of the range window and is given by (5.4). Furthermore, nding a (local) minimum of substrate coupling impact is roughly proportional to the number of terms in the function to be minimized. This, in turn is proportional to the number of neighboring modules. Consequently, the total computational complexity is dominated by the latter. Since step 2 is repeated exactly times, the overall computational complexity of the algorithm is equal to . When is not dependent on , this expression equals . Compared with a from scratch computation of a packing, we may conclude that no additional computational overhead is induced by the substrate coupling impact minimization problem.
183
Table 8.1: Practical CPU times of one SA iteration with the proposed substrate coupling impact minimization (SCIM) method (second column), and the SCIM time alone (third column).
total time [s] 0.871 2.012 3.792 6.240 9.873 14.221 21.380
SCIM time [s] 0.075 0.101 0.133 0.156 0.178 0.206 0.250
incurs complexity it is better to choose a sorting algorithm such as bucket sort. The use of bucket sort is justied, since we may assume that the input distribution of values is semi-random.
(a)
(b)
Figure 8.9: Plots of the CPU time of one SA iteration as a function of the problem instance size : a) total time and b) SCIM time only. linear relationship between the CPU time and the number of modules for one complete SA iteration, whereas Figure 8.9(b) shows a linear relationship for the substrate coupling impact minimization algorithm, as expected. Note that the packing computation algorithm is the algorithm originally proposed by Murata et. al. [48]. For visual satisfaction, graphical representations of the standard packing results and the expanded and optimized packing results for a set of ten modules are shown in Figure 8.10.
184
4 8 5 7 0 9 0 6 1
4 8 5 7 9 6 1
4 8 5 7 0 9 6 1
corresponding to SP a) without and b) with explicit representation of empty space using corner stitching, and (c) optimized with respect to substrate coupling impact for given sensitivity values.
packing
of
10
modules
8.7.7 Conclusions
We presented a new and efcient substrate coupling impact minimization (SCIM) algorithm, that enables efcient incorporation of substrate problems into an iterative placement optimization loop. Substrate coupling has been recognized as one of the major physical design bottlenecks for high-performance high-frequency mixed-signal circuits. Therefore, minimizing the impact of substrate coupling will result in better designs in less design iterations. Results of simulations performed on randomly generated medium to large problem instances, clearly show that the practical run-time of the SCIM algorithm is linear in the problem instance size, which is optimal in the context of a from scratch computation of a packing. It should be noted that in order to incorporate the inuence of the coupling capacitances, more iterations are needed.
185
to applying the SCIM algorithm given in Figure 8.8 to the restricted set of moved modules. It is easy to see that the overall computational complexity of the incremental algorithm is proportional to the number of moved modules. Note that the latter holds under the assumption that enumerating these moved modules can be performed efciently, i.e. proportional to the number of moved modules.
186
Chapter 9
9.1 Conclusions
The implementation details of simulated annealing have a tremendous impact on the performance of the algorithm in practice. This is an issue which is mostly left undiscussed in virtually all papers which employ simulated annealing for global optimization. We have shown that knowledge of efcient algorithms and advanced data structures is of utmost importance in the context of designing (new) efcient algorithms in the context of mixed-signal layout generation. We have proposed and implemented an efcient incremental framework for computing accurate block placements under the constraint of several user-denable parameters. The efciency of the incremental approach is backed up by concise theoretical arguments. The average computational complexity for a single incremental computation, being , is better than any previously reported result. A new consistent (idempotent) linear-time placement-to-sequence-pair mapping algorithm is proposed. The algorithm is useful, for example, in the context of converting graphical user-interface data to an abstract format. An improved, more robust, and easy to implement constrained block placement algorithm has been proposed which improves signicantly over previous results. However, the naive implementation which leaves room for improvement, is slower than the original tuned algorithm.
188
A new method for constructing an efcient global routing graph from a placement of modules has been proposed. The method has average computational complexity , where is the number of placed modules. Under some reasonably weak conditions this complexity can be reduced to . An important feature of the new construction is the fact that dynamic changes in the graph are supported and can be performed efciently. We have devised new efcient global routing algorithms for nding obstacle-avoiding routes of multi-pin nets in the proposed global routing graph. These heuristics have been extensively benchmarked for a large set of routing problem instances derived from sequence-pair placements. The heuristic results are compared with optimal results which have been obtained using state-of-the-art third party tools. The fact that not all problem instances were solvable to optimality demonstrates the difculty of the problem instances (and the routing problem). A set of tests have been performed with the integrated accurate sequence-pair placement representation and accurate obstacle-avoiding global routing heuristic in the simulated annealing optimization loop. The outcome of our experiments demonstrates unambiguously that the current de-facto standard minimal-bounding-box routing method does not qualify for nding good placements while minimizing actual global routing length. Substrate coupling can be taken into account efciently and in an incremental manner using a linear complexity algorithm. Using pre-computed sensitivity values, we show that the impact of substrate coupling can easily be (locally) minimized.
Uniform wire distribution implies that modules are expanded evenly. This in turn means that the quality of an interconnect will not suffer too much from the change in module positions. Although the quality of an interconnect might degrade due to longer length, compared to an optimal-length interconnect in the expanded scenario, the relative quality should be quite insensitive to a uniform expansion operation. From a manufacturability/yield point of view, it can be advantageous to spread powerdissipating wires over a larger area so that the temperature is more evenly distributed over the chip, plus the occurrence of so-called hot spots that can eventually cause performance degradation over time might be prevented.
189
The detrimental effect of parasitic coupling can be somewhat lessened by proper wire spreading. At least the impact of wire coupling can be assessed and handled more easily when the number of wires in a single region is lessened.
In order to take the step to detailed routing, we need to make sure that enough routing space is reserved. Reserving enough space can be seen as a module expansion problem: how much do we need to expand each module? This, in turn, depends on which global route segments are assigned to which module, mapping the expansion problem into an assignment problem. The latter is an important problem which needs to be investigated in detail. Last but not least, we note that temperature issues are also important to consider in the context of dealing with other physical constraints, since temperature gradients can be considered to be as bad as process variations, especially in connection with matched circuit components. However, to perform temperature analysis in an accurate way, we need to estimate power accurately. It is well-known that the latter is a non-trivial problem which is an active eld of research. Efcient models to estimate power can help us in quickly determining (dominant) temperature proles, which in turn can be used in an iterative optimization framework. Much research still needs to be performed in this respect.
190
Bibliography
[1] Semiconductor Industry Association, National Technology Roadmap for Semiconductors, 1997. [2] E. Malavasi, Techniques for Performance-Driven Layout of Analog Integrated Circuits, M.S. thesis, University of California, Berkeley, 1993. [3] E. Charbon, Constraint-Driven Analysis and Synthesis of High-Performance Analog IC Layout, Ph.D. thesis, University of California, Berkeley, 1995. [4] H. Chang, A Top-Down, Constraint-Driven Design Methodology for Analog Integrated Circuits, Ph.D. thesis, University of California, Berkeley, 1994. [5] E.S. Ochotta, R.A. Rutenbar, and L.R. Carley, Synthesis of High-Performance Analog Circuits in ASTRX/OBLX, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 15, no. 3, pp. 273294, March 1996. [6] J.M. Cohn, D.J. Garrod, R.A. Rutenbar, and L.R. Carley, KOAN/ANAGRAM II: New Tools for Device-Level Analog Placement and Routing, IEEE Journal of SolidState Circuits, vol. 26, no. 3, pp. 330342, March 1991. [7] J.M. Cohn, D.J. Garrod, R.A. Rutenbar, and L.R. Carley, Analog Device-Level Layout Automation, The Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishers, 1994. [8] K. Lampaert, Analog Layout Generation for Performance and Manufacturability, Ph.D. thesis, Katholieke Universiteit Leuven, 1998. [9] C. Lin, D.M.W. Leenaerts, and A.H.M. van Roermund, Faster Incremental VLSI Placement Optimization, in Proc. 15 European Conference on Circuit Theory and Design, 2001, vol. II, pp. 153156. [10] C. Lin and D.M.W. Leenaerts, A New Efcient Method for Substrate-Aware DeviceLevel Placement, in Proc. ASP-DAC 2000, January 2000, pp. 533536. [11] B.R. Stanisic, N.K. Verghese, R.A. Rutenbar, L.R. Carley, and D.J. Allstot, Addressing Substrate Coupling in Mixed-Mode ICs: Simulation and Power Distribution Synthesis, IEEE Journal of Solid-State Circuits, vol. 29, pp. 226238, 1994. [12] G.A.M. van der Plas, J. Vandenbussche, W. Sansen, M.S.J. Steyaert, and G.G.E. Gielen, A 14-bit intrinsic accuracy Q random walk CMOS DAC, IEEE Journal of Solid-State Circuits, vol. 34, no. 12, pp. 17081718, December 1999.
192
Bibliography
[13] T. Koch, A. Martin, and S. Vo, SteinLib: An Updated Library on Steiner Tree Problems in Graphs, Tech. Rep. ZIB-Report 00-37, Konrad-Zuse-Zentrum f r Inforu mationstechnik Berlin, http://elib.zib.de/steinlib, 2000. [14] T. Lengauer, Combinatorial algorithms for integrated circuit layout, Wiley, Chichester, 1990. [15] N.A. Sherwani, Algorithms for VLSI physical design automation, Kluwer Academic, 1993. [16] R.J. Baker, H.W. Li, and D.E. Boyce, CMOS Circuit Design, Layout and Simulation, IEEE Press Series on Microelectronic Systems. IEEE Press, 1998. [17] E. Malavasi and E. Charbon, Constraint transformation for IC physical design, IEEE Transactions on Semiconductor Manufacturing, vol. 12, no. 4, pp. 386395, 1999. [18] G. Jusuf, P.R. Gray, and A.L. Sangiovanni-Vincentelli, CADICS - Cyclic Analog-toDigital Converter Synthesis, in Proc. IEEE International Conference on Computer Aided Design, November 1990, pp. 286289. [19] F.K. Hwang, D.S. Richards, and P. Winter, The Steiner Tree Problem, vol. 53 of Annals of Discrete Mathematics, North-Holland, Amsterdam, 1992. [20] A.B. Kahng and G. Robins, On Optimal Interconnections for VLSI, The Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishers, 1995. [21] M.R. Garey and D.S. Johnson, Computers and intractability : a guide to the theory of NP-completeness, Freeman, 1979. [22] C. Bliek, P. Spellucci, L.N. Vicente, A. Neumaier, L. Granvilliers, E. Monfroy, F. Benhamou, E. Huens, P. van Hentenryck, D. Sam-Haroud, and B. Faltings, Algorithms for Solving Nonlinear Constrained and Optimization Problems: The State of The Art, http://www.mat.univie.ac.at/ neum/glopt/coconut/, June 2001. [23] C. Darwin, The origin of species, John Murray, London, 1859. [24] I. Rechenberg, Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution, Frommann-Holzboog, 1973. [25] S. Kirkpatrick, C.D. Gelatt Jr., and M.P. Vecchi, Optimization by simulated annealing, Science, vol. 220, no. 4598, pp. 671680, 1983. [26] R. Dawkins, The selsh gene, Oxford University Press, Oxford, 1976. [27] J.H. Holland, Adaptation in natural and articial systems, University of Michigan Press, Ann Arbor, 1975. [28] E.H.L. Aarts and J. Korst, Simulated annealing and Boltzmann machines: a stochastic approach to combinatorial optimization and neural computing, Wiley-Interscience series in discrete mathematics and optimization. Wiley, Chichester, 1989.
Bibliography
193
[29] L. Ingber, Simulated annealing: practice versus theory, Mathl. Comput. Modelling, vol. 18, no. 11, pp. 2957, 1993. [30] S.W. Stepniewski and A.J. Keane, Pruning back-propagation neural networks using modern stochastic optimization techniques, Neural Computing & Applications, vol. 5, pp. 7698, 1997. [31] S. Chalup and F. Maire, A study on hill climbing algorithms for neural network training, in Proc. 1999 Congres on Evolutionary Computation, 1999, pp. 20142021. [32] K.D. Boese and A.B. Kahng, Best-So-Far vs. Where-You-Are: Implications for Optimal Finite-Time Annealing, Systems and Control Letters, vol. 22, no. 1, pp. 7178, January 1994. [33] J. Cong, T. Kong, F. Liang, J.S. Liu, W.H. Wong, and D. Xu, Dynamic Weighting Monte Carlo for Constrained Floorplan Designs in Mixed Signal Application, in Proc. ASP-DAC 2000, 2000, pp. 277282. [34] T.H. Cormen, C.E. Leiserson, and R.L. Rivest, Introduction to Algorithms, McGraw Hill, 1990. [35] R.H.J.M. Otten and L.P.P.P. van Ginneken, The annealing algorithm, vol. 72 of The Kluwer International Series in Engineering and Computer Science, Kluwer Academic, 1989. [36] L. Ingber, Very Fast Simulated Re-Annealing, Journal of Mathl. Comput. Modelling, vol. 12, pp. 967973, 1989. [37] B. Hajek, Cooling schedules for optimal annealing, Mathematics of operations research, vol. 13, no. 2, pp. 311329, 1988. [38] M. Huang and A. Sangiovanni-Vincentelli, An Efcient General Cooling Schedule for Simulated Annealing, in Proc. International Conference on Computer-Aided Design, 1986, pp. 381384. [39] A.B. Kahng, Classical Floorplanning Harmful?, in Proc. ISPD, 2000, pp. 207213. [40] H. Onodera, Y. Taniguchi, and K. Tamaru, Branch-and-Bound Placement for Building Block Layout, in Proc. ACM/IEEE Design Automation Conference, 1991, pp. 433439. [41] C. Sechen, VLSI placement and global routing using simulated annealing, vol. 54 of The Kluwer international series in engineering and computer science, Kluwer Academic, Dordrecht, 1988. [42] W. Kruiskamp, Analog design automation using genetic algorithms and polytopes, Ph.D. thesis, Eindhoven University of Technology, Eindhoven, The Netherlands, 1996. [43] K. Francken, P. Vancorenland, and G. Gielen, DAISY: a simulation-based highlevel synthesis tool for modulators, in Proc. IEEE International Conference on Computer Aided Design, November 2000, pp. 188192.
194
Bibliography
[44] L. Ingber and B. Rosen, Genetic algorithms and very fast simulated reannealing: A comparison, Mathematical Computer Modeling, vol. 16, no. 11, pp. 87100, 1992. [45] F. Balasa and K. Lampaert, Module Placement for Analog Layout Using the Sequence-Pair Representation, in Proc. ACM/IEEE Design Automation Conference, 1999, pp. 274279. [46] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani, VLSI/PCB Placement with Obstacles Based on Sequence Pair, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 17, pp. 6168, 1998. [47] X. Tang and D.F. Wong, Fast-SP: A Fast Algorithm for Block Placement based on the Sequence Pair, in Proc. ASP-DAC 2001, 2001, pp. 521526. [48] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani, VLSI Module Placement Based on Rectangle-Packing by the Sequence-Pair, IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 15, pp. 15181524, 1996. [49] P. Diaconis and D. Stroock, Geometric bounds for eigenvalues of Markov chains, The Annals of Applied Probability, , no. 1, pp. 3661, 1991. [50] J.S. Liu, Monte Carlo Strategies in Scientic Computing, Springer Series in Statistics. Springer Verlag, March 2001. [51] R.L. Graham, D.E. Knuth, and O. Patashnik, Concrete Mathematics: A Foundation for Computer Science, Addison-Wesley Publishing Company, 1989. [52] J.K. Ousterhout, Corner Stitching: A Data-Structuring Technique for VLSI Layout Tools, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 3, no. 1, pp. 87100, January 1984. [53] W. Pugh, Skip Lists: A Probabilistic Alternative to Balanced Trees, Communications of the ACM, vol. 33, no. 6, pp. 668676, June 1990. [54] D.D. Sleator and R.E. Tarjan, Self-Adjusting Binary Search Trees, Journal of the Association of Computing Machinery, vol. 32, no. 3, pp. 652686, July 1985. [55] R. R nngren and R. Ayani, A Comparative Study of Parallel and Sequential Priority o Queue Algorithms, ACM Transactions on Modeling and Computer Simulation, vol. 7, no. 2, pp. 157209, April 1997. [56] C. Martnez and S. Roura, Randomized binary search trees, J. ACM, vol. 45, no. 2, pp. 288323, March 1998. [57] G. M. Adelson-Velskii and Y. M. Landis, An algorithm for the organization of information, Doklady Akademii Nauk SSSR, vol. 146, pp. 263266, 1962, English translation in Soviet Math. Dokl., 3:1259-1262. [58] P. van Emde Boas, Preserving order in a forest in less than logarithmic time, in Proc. Annual Symposium on Foundations of Computer Science, 1975, pp. 7584.
Bibliography
195
[59] K. Mehlhorn and S. N her, Bounded ordered dictionaries in time and a space, Information Processing Letters, vol. 35, pp. 183189, 1990. [60] R.H.J.M. Otten, What is a Floorplan?, in Proc. ISPD 2000, 2000, pp. 212217. [61] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani, Rectangle-Packing-Based Module Placement, in Proc. ICCAD, 1995, pp. 472479. [62] P.-N. Guo, C.-K. Cheng, and T. Yoshimura, An O-Tree Representation of Non-Slicing Floorplan and Its Applications, in Proc. DAC99, 1999, pp. 268273. [63] S. Nakatake, K. Fujiyoshi, H. Murata, and Y. Kajitani, Module Placement on BSGStructure and IC Layout Applications, in Proc. ICCAD96, 1996, pp. 484491. [64] T. Takahashi, A New Encoding Scheme for Rectangle Packing Problem, in Proc. ASP-DAC 2000, 2000, pp. 175178. [65] S. Sahni and A. Bhatt, The Complexity of Design Automation Problems, in Proc. IEEE/ACM Design Automation Conference, June 1980, pp. 402411. [66] R.H.J.M. Otten, Automatic Floorplan Design, in Proc. DAC82, 1982, pp. 261267. [67] D.F. Wong and C.L. Liu, A New Algorithm for Floorplan Design, in Proc. DAC86, 1986, pp. 101107. [68] D.W. Jepsen and C.D. Gelatt Jr., Macro Placement by Monte Carlo Annealing, in Proc. IEEE International Conference on Computer Design, 1983, pp. 495498. [69] Y.-C. Chang, Y.-W. Chang, G.-M. Wu, and S.-W. Wu, B -Trees: A New Representation for Non-Slicing Floorplans, in Proc. Design Automation Conference, 2000, pp. 458463. [70] Y. Pang, F. Balasa, K. Lampaert, and C.-K. Cheng, Block placement with symmetry constraints based on the O-tree non-slicing representation, in Proc. Design Automation Conference, 2000, pp. 464467. [71] D.E. Knuth, Selected Papers on Computer Science (CSLI Lecture Notes, No. 59), C S L I Publications, June 1996. [72] F. Balasa, Modeling Non-Slicing Floorplans with Binary Trees, in Proc. International Conference on Computer Aided Design, 2000, pp. 1316. [73] X. Hong, G. Huang, Y. Cai, J. Gu, S. Dong, C.-K. Cheng, and J. Gu, Corner Block List: An Effective and Efcient Topological Representation of Non-Slicing Floorplan, in Proc. International Conference on Computer Aided Design, 2000, pp. 812. [74] K. Fujiyoshi and H. Murata, Arbitrary Convex and Concave Rectilinear Block Packing Using Sequence-Pair, in Proc. ISPD99, 1999, pp. 103110. [75] M.Z. Kang and W.W-M. Dai, Arbitrary Rectilinear Block Packing Based on Sequence Pair, in Proc. ICCAD98, 1998, pp. 259266.
196
Bibliography
[76] J. Xu, P.-N. Guo, and C.-K. Cheng, Rectilinear Block Placement Using Sequence Pair, in Proc. ISPD, 1998, pp. 173178. [77] S. Nakatake, M. Furuya, and Y. Kajitani, Module placement on BSG-structure with pre-placed modules and rectilinear modules, in Proc. ASP-DAC98, 1998, pp. 571 576. [78] K. Sakanushi, S. Nakatake, and Y. Kajitani, The Multi-BSG: Stochastic Approach to an Optimum Packing of Convex-Rectilinear Blocks, in Proc. ICCAD98, 1998, pp. 267274. [79] H. Murata, K. Fujiyoshi, T. Watanabe, and Y. Kajitani, A Mapping from SequencePair to Rectangular Dissection, in Proc. ASP-DAC97, 1997, pp. 625633. [80] M.J.M. Pelgrom, A.C.J. Duinmaijer, and A.P.G. Welbers, Matching Properties of MOS Transistors, IEEE Journal of Solid-State Circuits, vol. 24, no. 5, pp. 1433 1440, October 1989. [81] C. Lin and D.M.W. Leenaerts, A New Faster Sequence Pair Algorithm, in Proc. ISCAS 2000, May 2000, vol. III, pp. 407410. [82] J.W. Hunt and T.G. Szymanski, A Fast Algorithm for Computing Longest Common Subsequences, Communications of the ACM, vol. 20, no. 5, pp. 350353, March 1977. [83] X. Tang, R. Tian, and D.F. Wong, Fast Evaluation of Sequence Pair in Block Placement by Longest Common Subsequence Computation, in Proc. DATE 2000, 2000, pp. 106111. [84] D.E. Knuth, The Art of Computer Programming, vol. 3, Addison-Wesley Publishing Company, edition, 1989. [85] T. Takahashi, An Algorithm for Finding a Maximum-Weight Decreasing Sequence in a Permutation, Motivated by Rectangle Packing Problem, Tech. Rep. IEICE, vol. VLD96, no. 201, pp. 3135, 1996. [86] E. M kinen, On the longest upsequence problem for permutations, Tech. Rep. Aa 1999-7, University of Tampere, Finland, 1999. [87] M.-S. Chang and F.-H. Wang, Efcient algorithms for the maximum weight clique and maximum weight independent set problems on permutation graphs, Information Processing Letters, vol. 43, pp. 293295, 1992. [88] W.L. Hsu, Maximum weight clique algorithms for circular-arc graphs and circle graphs, SIAM J. Comput., vol. 14, pp. 224231, 1985. [89] P. Beame and F. E. Fich, Optimal Bounds for the Predecessor Problem, in Proc. STOC99, 1999, pp. 295304. [90] G. Ramalingam and T. Reps, On the computational complexity of dynamic graph algorithms, Theoretical Computer Science, vol. 158, pp. 233277, 1996.
Bibliography
197
[91] G. Ramalingam and T. Reps, An Incremental Algorithm for a Generalization of the Shortest-Path Problem, Journal of Algorithms, vol. 21, pp. 267305, 1996. [92] D. Frigioni, M. Ioffreda, U. Nanni, and G. Pasqualone, Experimental Analysis of Dynamic Algorithms for the Single Source Shortest Path Problem, in Proc. Workshop on Algorithm Engineering, 1997, pp. 5463. [93] K. Kozminski, MCNC benchmark data, in International Workshop on Layout Synthesis 1990, 1990, http://www.cbl.ncsu.edu/CBL Docs/lys90.html. [94] S. Nakatake, Y. Kubo, and Y. Kajitani, Consistent Floorplanning with Super Hierarchical Constraints, in Proc. ISPD01, 2001, pp. 144149. [95] X. Tang, Constrained Sequence-Pair-Based Placement, private communication, 2001. [96] J.L. Ganley, Geometric Interconnection and Placement Algorithms, Ph.D. thesis, University of Virginia, 1995. [97] M.R. Garey and D.S. Johnson, The rectilinear Steiner tree problem is NP-complete, SIAM J. Appl. Math., vol. 32, pp. 826834, 1977. [98] M. Hanan, On Steiners problem with rectilinear distance, J. SIAM Appl. Math., vol. 14, pp. 255265, 1966. [99] E.W. Dijkstra, A note on two problems in connexion with graphs, Numerische Mathematik, vol. 1, pp. 269271, 1959. [100] N.J. Nilsson, Principles of Articial Intelligence, Tioga Publishing Company, Palo Alto, CA, 1980. [101] J. Pearl, Heuristics: Intelligent Search Strategies for Computer Problem Solving, The Addison-Wesley Series in Articial Intelligence. Addison-Wesley, Reading, Mass., 1984. [102] R.C. Prim, Shortest connection networks and some generalizations, Bell System Technical Journal, vol. 36, pp. 13891401, 1957. [103] H.-P. Tseng, Detailed Routing Algorithms for VLSI Circuits, Ph.D. thesis, University of Washington, Seattle, 1997. [104] C. Chiang, M. Sarrafzadeh, and C.K. Wong, Global Routing Based on Steiner MinMax Trees, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 9, no. 12, pp. 13181325, 1990. [105] C.Y. Lee, An algorithm for path connections and its applications, IRE Transactions Electronic Computers, vol. EC-10, no. 3, pp. 346365, 1961. [106] K. Kanchanasut, A shortest-path algorithm for Manhattan graphs, Information Processing Letters, vol. 49, pp. 2125, 1994.
198
Bibliography
[107] T. Matsumoto, N. Saigan, and K. Tsuji, Two new efcient approximation algorithms for the Steiner tree problem in rectilinear graphs, in Proc. Int. Symp. on Circuits and Systems, June 1991, vol. 2 of 5, pp. 11561159. [108] E. Malavasi and A. Sangiovanni-Vincentelli, Area routing for analog layout, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 12, no. 8, pp. 11861197, August 1993. [109] T. Adler and E. Barke, Single step current driven routing of multiterminal signal nets for analog applications, in Proc. Design, Automation and Test in Europe Conference and Exhibition 2000, 2000, pp. 446450. [110] L.-C.E. Liu and C. Sechen, Multilayer Chip-Level Global Routing Using an Efcient Graph-Based Steiner Tree Heuristic, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 18, no. 10, pp. 14421451, October 1999. [111] U. Choudhury and A. Sangiovanni-Vincentelli, Use of Performance Sensitivities in Routing of Analog Circuits, in Proc. International Symposium on Circuits and Systems, 1990, vol. 1, pp. 348351. [112] J. Cong and P.H. Madden, Performance Driven Multi-Layer General Area Routing for PCB/MCM Designs, in Proc. Design Automation Conference, 1998, pp. 356361. [113] J.P. Cohoon and D.S. Richards, Optimal two-terminal wire routing, Integration: the VLSI Journal, vol. 6, pp. 3557, 1988. [114] S.Q. Zheng, J.S. Lim, and S.S. Iyengar, Finding Obstacle-Avoiding Shortest Paths Using Implicit Connection Graphs, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 15, no. 1, pp. 103110, January 1996. [115] D. Warme, GeoSteiner extensions for Steiner minimal trees in graphs, private communication, 2000. [116] R. Dechter and J. Pearl, Generalized Best-First Search Strategies and the Optimality of A , Journal of the Association of Computing Machinery, vol. 32, no. 3, pp. 505 536, July 1985. [117] H. Zhou, N. Shenoy, and W. Nicholls, Efcient Minimum Spanning Tree Construction without Delaunay Triangulation, in Proc. ASP-DAC 2001, 2001. [118] F.K. Hwang, On Steiner minimal trees with rectilinear distance, SIAM Journal of Applied Mathematics, vol. 30, no. 1, pp. 104114, 1976. [119] I.I. M ndoiu, V.V. Vazirani, and J.L. Ganley, A new heuristic for rectilinear Steiner a trees, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 19, no. 10, pp. 11291139, October 2000. [120] H. Takahashi and A. Matsuyama, An approximate solution for the Steiner problem in graphs, Math. Japonica, vol. 24, no. 6, pp. 573577, 1980.
Bibliography
199
[121] V.J. Rayward-Smith and A. Clare, On nding Steiner vertices, Networks, vol. 16, pp. 283294, 1986. [122] P. Winter and J. MacGregor Smith, Path-distance heuristics for the Steiner problem in undirected networks, Algorithmica, vol. 7, pp. 309327, 1992. [123] E.P. Huijbregts, A Complete Design Path for the Layout of Flexible Macros, Ph.D. thesis, Eindhoven University of Technology, Eindhoven, The Netherlands, 1996. [124] V.J. Rayward-Smith, The computation of nearly minimal Steiner trees in graphs, Int. J. Math. Educ. Sci. Technol., vol. 14, pp. 1523, 1983. [125] D.M. Warme, P. Winter, and http://www.diku.dk/geosteiner, 1999. M. Zachariasen, GeoSteiner 3.0,
[126] T. Sakurai, Closed-Form Expressions for Interconnection Delay, Coupling, and Crosstalk in VLSIs, IEEE Transactions on Electron Devices, vol. 40, no. 1, pp. 118124, January 1993. [127] K. Doris, C. Lin, and A.H.M. van Roermund, D/A Conversion: Amplitude and Time Error Mapping Optimization, in Proc. ICECS 2001, September 2001, pp. 863866. [128] R. Gharpurey, Modeling and Analysis of Substrate Coupling in Integrated Circuits, Ph.D. thesis, University of California, Berkeley, 1995. [129] H. Veendrick, Deep-Submicron CMOS ICs: From basics to ASICs, Kluwer Academic Publishers, 2 edition, 2000. [130] K. Joardar, A simple approach to modeling cross-talk in integrated circuits, IEEE Journal of Solid-State Circuits, vol. 29, pp. 12121219, 1994. [131] L. Deferm, C. Claes, and G.J. Declerck, Two- and Three-Dimensional Calculation of Substrate Resistance, IEEE Transactions on Electron Devices, vol. 35, no. 3, pp. 339352, March 1988. [132] P.N. Parakh and R.B. Brown, Crosstalk Constrained Global Route Embedding, in Proc. ISPD99, 1999, pp. 201206. [133] H. Zhou and D.F. Wong, Optimal River Routing with Crosstalk Constraints, ACM Transactions on Design Automation of Electronic Systems, vol. 3, no. 3, pp. 496514, July 1998. [134] G.E. Forsythe, M.A. Malcolm, and C.B. Moler, Computer methods for mathematical computations, Prentice-Hall, 1977. [135] W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes in C: The Art of Scientic Computing, Cambridge University Press, 2nd edition, 1992.
200
Bibliography
Acknowledgement
A thesis is not complete without having given credit to those who have contributed to the contents and shape of this work, directly or indirectly. First of all I would like to thank my coach Dr. Domine Leenaerts for the fruitful discussions we had from which many unexpected and stimulating thoughts have bloomed. Also, his criticism has surely led to a higher quality of this research work. Moreover, I am very grateful to both Domine and my former advisor Prof. Wim van Bokhoven for giving me full research freedom; a condition that has made me enjoy this work even more. I would also like to express my gratitude to my current advisor Prof. Arthur van Roermund with whom I had many constructive in-depth discussions. Not only did he open my mind for different views towards my work, but he also motivated me to clarify several aspects which eventually helped me to improve my own understanding of the matter. My second advisor, Prof. Ralph Otten, has also been of great help in improving the quality of this thesis. My time in the Mixed-signal Microelectronics (MsM) research group would not have been so enjoyable without the presence of: Mrs. Linda Balvers (thanks for your support and kindness), Dr. Joost Briaire (thanks for the discussions), Dr. Hans Hegt (thank you for your openness), Mr. Piet Klessens (thanks for keeping the systems up-and-running and the oliebollen), and all other MsM members. There is one person who needs special mentioning because he was stuck with me in the same room for three years. Kostas Doris, thank you for being a nice companion and friend. Moreover, I appreciate your involvement, both in scientic as well as in social respects. I have had the pleasure to tutor two students who both did a great job on part of this research work. Mario Schehle did a substantial amount of work on routing algorithms, and Lennart Reus built a nice graphical user interface. Thanks guys! I would not be a worthy XBlast player if I would ignore those numerous fun periods that lasted longer than they should; time really ies when you are having fun. In the rst place I must thank the main author of the great XBlast game, Oliver Vogel, for making the lives of many Ph.D. students a whole lot more pleasant. I thank my fellow XBlasters, not only for giving me some points now and then, but also for quite some interesting discussions. In order of disappearance from the scene: Dr. Jurgen nu voor het echie van Engelen, Dr. Dani l e ik pak je Schobben, Dr. Arno wat doe je daar van Leest. Dr. Eddine ikel Sarroukh. I believe, as yet, it is still unclear who the real Master Blaster is. As with research, the game never ends. In addition to the people in my vicinity, there are a few persons I am indebted to for their contribution to some important parts of this research work. These people are: Dr. David Warme, Dr. Aart Blokhuis, and Dr. Thorsten Koch. Unfortunately, I cannot thank everyone explicitly, although I would like to do so. Therefore, to all persons who feel they should be mentioned in this acknowledgement, but are not, you know who you are.
202
Acknowledgement
I eagerly take this opportunity to express my gratitude towards my wife and my sons who have endured many many hours of my absent-mindedness and absence due to research work. I admit that a better optimization approach is needed here. Last but not least, I thank my parents for giving me the opportunity to choose, and their support for my choices. Chieh Lin December 29, 2001
Curriculum Vitae
Chieh Achie Lin was born on December 2, 1972 in Ruian City, China. He received his diploma from the Bouwens van der Boijecollege in Panningen, the Netherlands, in 1991. In September 1991 he started a study Informatietechniek at the department of Electrical Engineering, Eindhoven University of Technology, The Netherlands. In June 1997 he received the Ingenieur (Ir.) degree from this institute. His nal report was titled Design and Implementation of SHANNI: a Stand-alone Hybrid Articial Neural Network Implementation. Thereafter, he worked towards a Ph.D. degree in the Mixed-signal Microelectronics research group of the department of Electrical Engineering at Eindhoven University of Technology. Based on the work presented in this thesis, he expects to receive the Ph.D. degree on Wednesday, Februari 20, 2002. Since June 2001, he is with Philips Research, Electronic Design & Tools. Currently he is focusing on development of CAD tools (for analog simulation and synthesis), with emphasis on radio-frequency issues. This research was supported by the Dutch Organization for Scientic Research (NWO).