You are on page 1of 221

Incremental Mixed-Signal Layout Generation Concepts

Theory & Implementation

Chieh Lin (

Cover design: Chieh Lin

Incremental Mixed-Signal Layout Generation Concepts


Theory & Implementation

PROEFSCHRIFT

ter verkrijging van de graad van doctor aan de Technische Universiteit Eindhoven, op gezag van de Rector Magnicus, prof.dr. R.A. van Santen, voor een commissie aangewezen door het College voor Promoties in het openbaar te verdedigen op woensdag 20 februari 2002 om 16.00 uur

door

Chieh Lin

geboren te Ruian City, China

Dit proefschrift is goedgekeurd door de promotoren: prof.dr.ir. A.H.M. van Roermund en prof.dr.ir. R.H.J.M. Otten Copromotor: dr.ir. D.M.W. Leenaerts

c Copyright 2002 by Chieh Lin All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission from the copyright holder.

CIP-DATA LIBRARY TECHNISCHE UNIVERSITEIT EINDHOVEN Lin, Chieh Incremental mixed-signal layout generation concepts: theory and implementation / by Chieh Lin. - Eindhoven : Technische Universteit Eindhoven, 2002. Proefschrift. - ISBN 90-386-1880-8 NUGI 832 Trefw.: elektrische netwerken ; CAD / analoge geintegreerde schakelingen / optimalisering ; algoritmen / routering / grafen ; algoritmen. Subject headings: circuit layout CAD / mixed analogue-digital integrated circuits / circuit optimisation / algorithm theory / graph theory.

A few words of sincere appreciation to my wife Jie ( ) and my sons ) and Qi Mo ( ), Bo Yi ( for their understanding, faith and love.

promotiecommissie:

prof.dr.ir. W.M.G. van Bokhoven prof.dr.ir. P. Dewilde prof.dr.ir. G. Gielen prof.dr.ir. P. Groeneveld dr.ir. J.A. Hegt dr.ir. D.M.W. Leenaerts prof.dr.ir. R.H.J.M. Otten prof.dr.ir. A.H.M. van Roermund prof.ir. M.P.J. Stevens

Summary
The framework of this thesis encompasses the task of (automatically) generating a physical layout from a given circuit description including specications. Typically, the performance of the circuit extracted from a layout is not equal to the circuit-level performance before physical design. The reason for this is that the physical design step adds undesirable parasitic components to the original circuit, in effect causing a deviation from the wanted circuit behavior. The ultimate goal is to minimize this deviation with respect to the specications so that proper operation of the overall design can be warranted. Due to the complexity of the overall task, we mainly focus on the placement and routing problem. The class of circuits under consideration is a particularly difcult one, containing circuits with both analog and digital functionality operating at high frequencies. In this environment, physical phenomena such as crosstalk and process variations are most pronounced. Therefore, innovative automatic layout techniques are required that can produce high-quality layouts while considering all these second-order effects. The most important ingredients to tackle the placement and routing problem successfully are: adequate data structures, appropriate models/representations and efcient algorithms. Moreover, since the placement and routing problems are extremely difcult strongly coupled problems, heuristic methods are employed to nd near-optimal solutions. In this thesis, the simulated annealing algorithm is used as a general-purpose stochastic optimization engine. As this algorithm uses a massive amount of iterations before converging to a nal solution, it is worthwhile to reduce the amount of computed information during each iteration. This is actually the philosophy behind all novel techniques and algorithms presented in this thesis; try to use and compute strictly necessary information to nd a solution as efciently as possible. In other words, we adopt an incremental approach. The abstract representation we use for block placement is the well-known sequence-pair structure. The routing space is modeled using a sparsied grid-graph which is derived directly from a placement. This graph is used in conjunction with a variety of multi-pin routing heuristics for which we also developed visualization tools. During placement, substrate coupling is efciently taken into account using a novel approach combining the corner-stitching data structure and geometric techniques. We show that incremental placement techniques can substantially reduce computation time. The theoretical analyses are backed up by experimental results. Noticeably, the obtained gain increases with the problem instance size. A new method for constructing a dynamic global routing graph from a given placement is presented. This graph can be employed effectively in an incremental environment. Furthermore, several graph-based routing heuristics are benchmarked. Signicant improvements of new routing heuristics over existing ones are demonstrated over a broad range of synthesized problem instances. Also, incremental techniques that can be applied to these routing heuristics are described in the context of placement perturbations. Finally, we show that substrate-aware placement can also be exploited in

vi

Summary

an incremental environment without inducing signicant computational overhead.

Samenvatting
Het kader van dit proefschrift omvat het automatisch genereren van een fysieke layout uitgaande van een gegeven circuitbeschrijving inclusief specicaties. Normaliter is het gedrag van een circuit dat ge xtraheerd is van een layout, afwijkend van die van het originele cire cuitgedrag. De reden hiervoor is dat een layout additionele (parasitaire) componenten toevoegt aan het originele circuit, hetgeen resulteert in afwijkend circuitgedrag. Het uiteindelijke doel van een goede layout is het minimaliseren van deze afwijkingen ten aanzien van de opgegeven specicaties teneinde de correcte werking van het totaalontwerp te kunnen waarborgen. Ten gevolge van de complexiteit van deze taak, richten we ons hoofdzakelijk op het plaatsings- en bedradingsprobleem. We beschouwen een aparte, bijzonder moeilijke klasse van circuits, namelijk circuits die zowel analoge als digitale functionaliteit bevatten. In deze context spelen verschijnselen zoals overspraak en procesvariaties een zeer belangrijke rol. Derhalve zijn innovatieve automatische layouttechnieken noodzakelijk om kwalitatief hoogwaardige layouts te genereren waarin deze tweede-orde- verschijnselen meegenomen worden. De meest belangrijke ingredi nten die nodig zijn om het plaatsings- en bedradingsproe bleem adequaat op te lossen zijn: geavanceerde datastructuren, efci nte modellen/represene taties, en efci nte algoritmen. Aangezien het plaatsings- en bedradingsprobleem extreem e moeilijke, sterk gekoppelde problemen zijn, gebruiken we heuristische methoden om (bijna-) optimale oplossingen te vinden. In dit proefschrift, wordt het zogenaamde simulated annealing algoritme gebruikt als een algemene stochastische optimalisatie-aanpak. Daar dit algoritme een zeer groot aantal iteraties nodig heeft om te convergeren naar een eindresultaat, is het van belang het aantal benodigde rekenkundige operaties drastisch te reduceren. Dit laatste is in feite de losoe achter alle nieuwe algoritmen en technieken die in dit proefschrift aan de orde komen; probeer slechts die informatie te gebruiken en te berekenen die strikt noodzakelijk is om zo efci nt mogelijk tot een (eind)oplossing te komen. Met andere e woorden, we hanteren een incrementele aanpak. De abstracte representatie die wordt gehanteerd voor blokplaatsing is de welbekende sequence pair-structuur. De bedradingsruimte wordt gemodelleerd door middel van een ijle gridgraaf welke direct wordt afgeleid van een blokplaatsing. Deze graaf wordt gebruikt in combinatie met verscheidene multi-pin-bedradingsheuristieken waarvoor we tevens visualisatie-gereedschappen hebben ontwikkeld. Substraatkoppeling wordt op een efci nte e wijze in de plaatsing verdisconteerd met gebruikmaking van de zogenaamde corner stitching-datastructuur en geometrische technieken. We tonen aan dat incrementele plaatsingstechnieken kunnen leiden tot een substanti le ree ductie van de rekentijd. De theoretische analyses worden ondersteund door experimentele resultaten. De winst die op deze manier kan worden behaald, neemt toe naarmate de probleeminstanties in kardinaliteit toenemen. Tevens presenteren we een nieuwe methode waarmee een dynamische globale bedradingsgraaf uit een gegeven blokplaatsing verkregen kan wor-

viii

Samenvatting

den. Deze graaf kan op effectieve wijze gebruikt worden in een incrementele omgeving. Verder voeren we uitvoerige vergelijkende testen uit met verscheidene graafgeori nteerde e bedradingsheuristieken. We laten zien dat enkele van de nieuwe bedradingsheuristieken signicant beter presteren, gemeten over een vrij grote verzameling gesynthetiseerde probleeminstanties. Tevens wordt beschreven hoe incrementele technieken toegepast kunnen worden op de bedradingsheuristieken teneinde ze efci nter te laten opereren in de context e van plaatsingsperturbaties. Tenslotte laten we zien hoe op een efci nte incrementele manier e substraatkoppelingen meegenomen kunnen worden in de uiteindelijke layout, zonder de computationele complexiteit signicant te verhogen.

Contents
Summary Samenvatting List of Abbreviations 1 Introduction 1.1 Background . . . . . . . . . . . 1.2 State of the Art . . . . . . . . . 1.3 Motivation . . . . . . . . . . . . 1.4 Goals of this Research Work . . 1.5 Thesis Outline . . . . . . . . . . 1.6 Main Contributions of this Work v vii xiii 1 1 3 4 4 5 6 9 9 10 11 13 14 15 16 16 16 17 17 18 18 19 21 21 22 23 24 25 25 27

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

2 Problem Denition 2.1 Top-Down Flow and Bottom-Up Approach 2.1.1 A VLSI Design Cycle . . . . . . . 2.1.2 Physical Design . . . . . . . . . . . 2.1.3 Mixed-Signal Layout Styles . . . . 2.1.4 From Circuit to Layout . . . . . . . 2.1.5 Layout System Requirements . . . 2.2 The Mapping Problem . . . . . . . . . . . 2.2.1 High-Level Specications . . . . . 2.2.2 Layout System Specications . . . 2.2.3 Constraint Mapping Problem . . . . 2.2.4 High-Level Sensitivities . . . . . . 2.2.5 Lower Level Sensitivities . . . . . . 2.2.6 Sensitivity Computation Problem . 2.3 Placement and Routing Constraints . . . . . 3 Optimization Methods 3.1 VLSI Optimization Methods . . . 3.1.1 Deterministic Algorithms 3.1.2 Stochastic Algorithms . . 3.1.3 Heuristic Algorithms . . . 3.2 Simulated Annealing . . . . . . . 3.2.1 Basic SA Algorithm . . . 3.2.2 Problem Representation .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

Contents

3.3

3.2.3 Perturbation Operators . . . . . . . . 3.2.4 Acceptance and Generation Functions 3.2.5 Temperature Schedule . . . . . . . . 3.2.6 Stop Criterion . . . . . . . . . . . . . 3.2.7 Cost Function . . . . . . . . . . . . . Concluding Remarks . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

28 28 29 30 30 30 33 34 36 36 36 38 38 40 40 41 41 41 42 43 44 44 45 46 46 48 50 51 54 55 55 57 59 61 63 66 70 70 76 77 81 81 83

4 Optimization Approach Based on Simulated Annealing 4.1 Optimization Flow . . . . . . . . . . . . . . . . . . 4.2 Problem Representation . . . . . . . . . . . . . . . . 4.2.1 Placement . . . . . . . . . . . . . . . . . . . 4.2.2 Routing . . . . . . . . . . . . . . . . . . . . 4.2.3 Substrate Coupling . . . . . . . . . . . . . . 4.3 Perturbation Operators . . . . . . . . . . . . . . . . 4.4 Acceptance and Generation Functions . . . . . . . . 4.5 Temperature Schedule . . . . . . . . . . . . . . . . 4.6 Stop Criterion . . . . . . . . . . . . . . . . . . . . . 4.7 Cost Function . . . . . . . . . . . . . . . . . . . . . 4.7.1 Implicit Cost Evaluation . . . . . . . . . . . 4.8 Concluding Remarks . . . . . . . . . . . . . . . . . 5 Efcient Algorithms and Data Structures 5.1 Computational Model . . . . . . . . . . 5.2 Asymptotic Analysis . . . . . . . . . . 5.3 Computational Complexity . . . . . . . 5.4 Data Structures for CAD . . . . . . . . 5.4.1 Corner Stitching . . . . . . . . 5.4.2 Linked List . . . . . . . . . . . 5.4.3 Splay Tree . . . . . . . . . . . 5.4.4 Hash Table . . . . . . . . . . . 5.4.5 Priority Queue . . . . . . . . . 5.4.6 Other Advanced Data Structures 5.5 Concluding Remarks . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

6 Placement 6.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Effective and Efcient Placement . . . . . . . . . . . . . . . . . . . . 6.3 Representation Generality, Flexibility and Sensitivity . . . . . . . . . 6.4 Sequence Pair Representation . . . . . . . . . . . . . . . . . . . . . . 6.5 Graph-Based Packing Computation . . . . . . . . . . . . . . . . . . 6.5.1 Relative Placement Computation . . . . . . . . . . . . . . . . 6.5.2 An Efcient Relative Placement Algorithm . . . . . . . . . . 6.5.3 Absolute Placement Computation . . . . . . . . . . . . . . . 6.6 Non-Graph-Based Packing Computation . . . . . . . . . . . . . . . . 6.6.1 Maximum-Weight Common Subsequence (MWCS) Problem . 6.6.2 Maximum-Weight Monotone Subsequence (MWMS) Problem

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

Contents

xi

6.7

Graph-based Incremental Placement Computation . . . . . . . . . . . . . . . 6.7.1 Incremental Relative Placement Computation . . . . . . . . . . . . . 6.7.2 Incremental Relative Placement Computational Complexity . . . . . 6.7.3 Incremental Absolute Placement Computation . . . . . . . . . . . . 6.7.4 Incremental Absolute Placement Computational Complexity . . . . . 6.7.5 Average Incremental Computational Complexity . . . . . . . . . . . 6.8 Implementation Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 6.9 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9.1 A Single Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9.2 Packing Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.10 Placement-to-Sequence-Pair Mapping . . . . . . . . . . . . . . . . . . . . . 6.11 Constrained Block Placement . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11.1 Non-Graph-Based Constrained Placement . . . . . . . . . . . . . . . 6.11.2 Implementation Considerations . . . . . . . . . . . . . . . . . . . . 6.11.3 Experimental Results on Non-Graph-Based Constrained Block Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11.4 Incremental Graph-Based Constrained Placement . . . . . . . . . . . 6.12 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Routing 7.1 The Routing Problem . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Classication of Routing Approaches . . . . . . . . . . . . . . . . 7.2.1 Routing Hierarchy . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Routing Model . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . 7.5 Global Routing Model . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Model Efciency . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 Global Routing Graph Computation . . . . . . . . . . . . . 7.5.3 Supporting Dynamic Changes . . . . . . . . . . . . . . . . 7.6 Global Routing Algorithms . . . . . . . . . . . . . . . . . . . . . . 7.6.1 Two-pin Routing Algorithms . . . . . . . . . . . . . . . . . 7.6.2 Minimal Bounding Box (MBB) Routing . . . . . . . . . . . 7.6.3 Minimum Spanning Tree (MST) Routing . . . . . . . . . . 7.6.4 Path-Based Routing . . . . . . . . . . . . . . . . . . . . . 7.6.5 Node-Based Routing . . . . . . . . . . . . . . . . . . . . . 7.7 Benchmarking of Heuristics in Our Routing Model . . . . . . . . . 7.7.1 Benchmark Problem Instances . . . . . . . . . . . . . . . . 7.7.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . 7.7.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . 7.8 Incremental Routing . . . . . . . . . . . . . . . . . . . . . . . . . 7.8.1 Re-routing Nets Connected to Moved Modules . . . . . . . 7.8.2 Re-routing Affected Nets Not Connected to Moved Modules 7.9 Impact of Routing on Placement Quality . . . . . . . . . . . . . . . 7.9.1 Integrated Placement and Routing . . . . . . . . . . . . . .

87 88 96 97 99 105 106 106 107 108 111 112 116 116 119 120 123 124 127 128 129 130 132 133 134 135 135 136 137 139 140 144 145 146 151 153 153 154 159 160 160 164 166 166

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

xii

Contents

7.9.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 167 7.9.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 7.10 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 8 Dealing with Physical Phenomena: Parasitics, Crosstalk and Process Variations 171 8.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 8.2 Efciency and Accuracy Requirements . . . . . . . . . . . . . . . . . . . . . 172 8.3 Self-Parasitics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 8.3.1 Wire Resistance, Capacitance and Inductance . . . . . . . . . . . . . 173 8.3.2 Via Resistance and Area . . . . . . . . . . . . . . . . . . . . . . . . 173 8.4 Crosstalk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 8.4.1 Substrate Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 8.4.2 Parasitic Coupling Capacitance . . . . . . . . . . . . . . . . . . . . 176 8.5 Process Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 8.6 Incorporating Crosstalk and Parasitics into Routing . . . . . . . . . . . . . . 177 8.7 Incorporating Substrate Coupling into Placement . . . . . . . . . . . . . . . 177 8.7.1 A Basic Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 8.7.2 Generalized 2-Dimensional Substrate Coupling Model . . . . . . . . 179 8.7.3 Substrate Coupling Impact Minimization . . . . . . . . . . . . . . . 180 8.7.4 An Efcient Substrate Coupling Impact Minimization Algorithm . . . 182 8.7.5 Implementation Considerations . . . . . . . . . . . . . . . . . . . . 182 8.7.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 183 8.7.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 8.8 Incremental Substrate Coupling Impact Minimization . . . . . . . . . . . . . 184 8.9 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 9 Conclusions and Directions for Future Research 187 9.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 9.2 Directions for Future Research . . . . . . . . . . . . . . . . . . . . . . . . . 188 Bibliography Acknowledgement Curriculum Vitae 189 201 203

List of Abbreviations

Average-Distance-Based Heuristic Average-Distance Heuristic Bounded Sliceline Grid Chip Area Computer-Aided Design Coupling Impact Constrained Maximum-Weight Common Subsequence Directed Acyclic Graph Direct View Electronic Design Automation Euclidian Steiner Minimal Tree Field-Programmable Gate Array Graph Steiner Minimal Tree Integrated Circuit Incremental Longest Paths Intellectual Property Iterated Shortest-Paths-Based Heuristic Longest Common Subsequence Left-Down Labeled Ordered Tree Minimal Bounding Box Minimum Spanning Tree Maximum-Weight Common Subsequence Maximum-Weight Monotone Subsequence Non-Slicing Placement Evaluation Random-Access Machine Rectilinear Steiner Minimal Tree Simulated Annealing Substrate Coupling Impact Minimization Steiner Minimal Tree Sequence Pair Shortest-Paths-Based Heuristic Shortest-Paths Heuristic Single-Source Shortest Paths Very Large Scale Integration Wire Length

xiv

List of Abbreviations

Chapter 1

Introduction
This purpose of this chapter is to give the reader an idea of the framework in which this work is oriented. Moreover, the need for automation of high-quality layout generation for mixed-signal designs is claried. Motivated by this fact, the objectives of this work are dened. Furthermore, an overview of the remaining chapters is given, concluded by the main contributions of this work.

1.1 Background
The design of integrated circuits has been an actively exploited area for almost half a century already. The possibility to integrate a plethora of functions onto a small piece of semiconductor material has enabled the development of many high-tech systems, e.g. the modern personal computer. Without exaggeration, one can state that without the invention of integrated circuits, the world would not be as it is today. With improvements in manufacturing technologies, also the integration density of components within a single integrated circuit (IC) has increased dramatically. The exponentially growing trend with the number of components in an IC as a function of time, still seems to hold and is expected to hold for at least another decade [1]. Figure 1.1 depicts this trend graphically. This trend is better known as Moores Law. Within this gure, a few keywords clarify some important trends. A very noticeable effect is that with increasingly smaller feature sizes and larger designs, the intrinsic speed of transistors increases, but the (global) wire delays also increase. With this trend, a vast area arose dedicated to the integration of circuits which is called the eld of Very Large Scale Integration (VLSI). Increasing the number of components in a given area has an obvious cost benet, because the number of produced ICs per time unit increases when other factors are kept constant. However, as always there is also a problematic dark side that comes with higher integration. The problems are even more pronounced due to higher operating frequencies of current systems. Roughly speaking, one part of the problems is related to the percentage of working ICs, which is called yield. Yield is a complicated factor that has links with many aspects of VLSI technology; from system design to circuit design, to layout design, to technology. Moreover, due to smaller sizes, accuracy and power dissipation problems emerge. Typically, most of these performance factors can be traded off against each other. The other part consists of the increasing inuence of unavoidable parasitic elements such as parasitic resistances, capacitances, inductances. But also parasitic substrate coupling and, to a lesser extent, electromagnetic coupling cannot be neglected anymore. Simply stated, the non-ideal behavior

2 Technology Trends

Introduction

200 transistor size (nm) 100 ASIC chip size (cm ) 10 # interconnect layers

1 1997 1999 2002 2005 2008 0.25 0.18 0.13 0.10 0.07 technology ( m) 2011 0.05 2014 year 0.035

Figure 1.1: The National Technology Roadmap for Semiconductors 1998. of semi-conductor material starts affecting the functionality of the IC in such a way, that measures have to be taken to ensure good functioning. The correct functioning of a system is especially susceptible to these parasitic phenomena in the case of mixed-signal designs, where both (insensitive and noisy) digital and (sensitive) analog building blocks are present on the same chip. Practical experience has been used and is still used to limit the adverse effects of nonidealities. However, due to the high complexity of VLSI systems it is an immensely difcult task to handle all problems adequately, even for an expert designer. This is where the computer comes into play. When a computer is used properly, it is able to handle large amounts of data and process it in such a way that the generated output satises certain given specications. The use of computers in a design task is called Computer Aided Design (CAD). A more appropriate term in connection with computer-aided design of ICs is Electronic Design Automation (EDA) which includes electronic CAD tools, but is more general. The purpose of a CAD tool is to support the designer during the process of realizing an IC. The nal physical outcome of the design process is a disc of silicon, called a wafer, which consists of a number of more or less identical copies of the same integrated circuit. This wafer is then cut, resulting in a set of dies, each one of them containing the same integrated circuit. The creation of such a wafer is accomplished using a set of masks which are used to deposit several layers of different materials onto the wafer. The task of organizing geometric information in this context, resulting in an answer to the question which materials have to be put where on the wafer, is called physical design. The end result of a physical design step is a called a layout, which is essentially a set of masks that comply with given design rules. Also layout synthesis or layout generation are used frequently in the same context. Although layout generation is a very important step in the VLSI process, it is only one of many steps. Some other important steps are circuit design, simulation and verication. Our primary concern in this work is the layout generation step of VLSI design. More

1.2 State of the Art

specically, we deal with a remarkably interesting and challenging subclass consisting of both analog and digital ingredients. In essence, layout generation is accomplished by solving two strongly coupled problems under a set of constraints. These two problems are known as the placement and the routing problem.

1.2 State of the Art


The design of integrated circuits is not new. Therefore, most problems associated with layout generation are not new. However, most problems became real problems when feature sizes approached submicron and deep-submicron dimensions and operating frequencies made a leap forward. Historically, the roots of algorithmic approaches to designing layouts lie in the digital area. It is there where the circuits grew extraordinary fast to incredibly large sizes. Therefore, layout design automation was rst investigated and employed in that area. However, contemporary mixed-signal designs also require the use of computers due to the large number of phenomena that have to be taken into account in order to comply with the specications. Larger designs, tighter specications and more interdependencies have led to increased complexity. The layout problem was boosted again recently by the observation that the layout problem for digital designs should be looked at through analog glasses. For, essentially, the phenomena that form bottlenecks in digital layout design, are analog in nature. An illustrative overview is given here which should give a good impression of the work that has been and is carried out in the VLSI layout generation domain, especially in the mixedsignal and analog elds. By no means does this overview intend to be complete. It only provides a representative sampling of existing mixed-signal and analog layout generation systems, in order to give an impression of the state-of-the-art in this eld, and to illustrate the variety of ways in which typical constraints are handled and problems are approached. In the works [2, 3], all conducted at the University of California at Berkeley, several techniques are introduced for performance-driven layout of analog integrated circuits. The basic concepts are sensitivity computation, modeling of performance constraints and performancedriven placement, routing and compaction. The approach is only suitable for small analog integrated circuits, because the system behavior scales very badly with increasing problem size. Furthermore, the circuits that can be handled are assumed to be linear. Examples of such circuits are operational ampliers and lters. In [4] the above approach is extended and incorporated into a top-down constraint-driven design methodology for analog integrated circuits. Researchers from Carnegie Mellon University have been quite successful with their tools ASTRX/OBLX [5] and KOAN/ANAGRAM [6, 7]. Recently, the latter has been commercialized by a startup company called Neolinear (http://www.neolinear.com). The tools are known to be successful when applied to linear analog circuits such as lters and operational ampliers. Also, the Catholic University of Leuven has contributed signicantly to the development of analog CAD tools. In [8] analog layout generation for performance and manufacturability is described in detail. The system employs a top-down hierarchical design methodology in which the explicit generation of a specic set of low-level constraints is avoided. Instead, the layout tools are driven more or less directly by higher-level performance constraints via

Introduction

pre-determined performance sensitivity values. An implicit assumption of this approach is that the circuit is sufciently linear in the region in which the layout parameters under consideration have inuence. The ultimate goal is a layout that satises all performance constraints by construction. Several other groups and researchers have attempted to transfer the methodologies which are used in the digital VLSI domain to the analog and mixed-signal VLSI domain, but most of these approaches have not been very successful as yet. The main reason being the fact that digital approaches rely on certain assumptions (to reduce complexity) that are simply unacceptable in the analog domain. A good example of this is the consideration of only a critical path to determine the quality of wiring.

1.3 Motivation
From the previous discussion it should be clear that VLSI design consists of several tasks which are very hard to solve in a proper way. Also the design step that considers layout generation is extremely difcult. It should be clear that the quality of a layout is of utmost importance in any IC design. Thus far, only few researchers have concentrated on layout generation for mixed-signal designs. Although layout generators are known for analog designs, those systems are usually not suitable for general application to mixed-signal designs where the layout problems are worst. There is not a single layout generator which is best compared to others. All existing approaches and systems have fundamental limitations and weaknesses. The current layout generation problems suffer somehow from at least one of the following problems.

Properly placing objects sub-circuit modules in a two-dimensional plane is performed poorly with respect to wiring quality. Only a subset of mixed-signal design constraints is (or can be) taken into account during placement and routing. Ad hoc solutions are used which are not very robust and require a signicant amount of (problem dependent) tuning effort. Scalability properties are poor due to inefcient modeling and/or implementation.

Thus, the necessity of improved layout generation concepts and systems is clear.

1.4 Goals of this Research Work


The goal of this thesis is to establish methodologies and concepts to automate the layout generation step within the framework of mixed-signal VLSI design. In principle there are two ways to approach the problem of generating high-quality mixedsignal layouts. One approach is to keep rening digital techniques, taking into account increasingly important second-order effects, so as to satisfy mixed-signal and analog requirements. The other approach is to review the physical design problem anew in order to nd fundamentally more efcient means to tackle it. For instance, a more efcient formal representation could contribute to this, eventually leading to better solutions in less time, combined

1.5 Thesis Outline

with better scalability1 performance. The latter approach is more interesting from a scientic point of view. The established methodology and concepts should be practically useful, demonstrated by simulation results. Scalability and generality of the approach is also a major concern. These features should be clear from theory, preferably backed up by experimental results.

1.5 Thesis Outline


Hereafter, the main topics which are covered in this thesis are described briey. First the problem we want to solve is described in detail in Chapter 2, from a system view to assessable subproblems. Also the inputs and outputs of the layout system are dened explicitly. It is made clear that various mapping problems need to be solved to establish a transparent interface between real-world specications and desired specications. Then, in Chapter 3 an overview of existing optimization methods is given which are possible candidates for VLSI-related problems. Based on our requirements and previous results on similar problems, we indicate which optimization method has preference. Consequently, in Chapter 4 we focus on the chosen optimization method and explore it in more detail, describing its properties and its application to our problem representation. Thus, the most important aspects of our optimization approach are discussed. We then clarify the impact of efcient algorithms and data structures on the performance of the overall system in Chapter 5. Moreover, scalability issues in connection with the efciency of algorithms and their data structures are discussed. One of the main topics in this thesis work is on the problem of optimally placing a set of objects in a two-dimensional plane under certain constraints. Chapter 6 discusses this problem in depth and an efcient solution is proposed. Theoretical analyses are carried out and compared with existing results. The chapter also gives an overview of all currently known approaches in the proposed placement context, linking several related but strictly mathematically oriented elds of research to the placement problem. Experimental results are obtained, which conrm the theory. As will be made clear at a later stage, placement of objects does not have any practical use if routing is ignored. Therefore, in Chapter 7 we explore routing issues within the same optimization framework, and put a signicant amount of effort into establishing a routing methodology. Apart from this, existing results are compared with new experimental results and discrepancies in previous approaches are exposed. Chapter 8 covers the problems of non-ideal effects in a layout. Both crosstalk and parasitics are well-known culprits of performance degradation in high-frequency designs. In mixed-signal circuits these phenomena manifest themselves quite rapidly, as compared to fully digital designs with large noise margins or fully analog designs with less noisy components. An overview is given of existing methods to tackle problems due to crosstalk and parasitics. Furthermore, a method is proposed to incorporate substrate coupling into the optimization framework in a most efcient manner. Since algorithmic and representation efciency are serious concerns throughout this research work, a major part of this thesis discusses fundamental concepts to improve efciency.
1 The term scalability is used here to denote the behavior of a computing system when the problem instance size under consideration increases.

Introduction

As these concepts are tightly coupled with a certain problem, it is more convenient to introduce and elaborate on them while discussing the underlying problem. A fundamental concept to improve efciency is incremental computation. In essence, the idea is to compute only new information when it is strictly necessary. We show that this approach leads to fundamental improvements in placement and global routing efciency in the adopted stochastic (simulated annealing) optimization framework. Note that higher computational efciency automatically implies better scalability properties. In each chapter, where appropriate, experimental results are given after the discussion of the respective algorithms. Furthermore, we have attempted to describe and present the experiments (and their results) in such a way that comparison with existing works is not hampered. We end this thesis with main conclusions and directions for future research in Chapter 9.

1.6 Main Contributions of this Work


Among the vast research topics in the VLSI community, we have focused on a few fundamental problems in VLSI design automation, taking into account important analog phenomena such as crosstalk and parasitics, while attempting to improve computational efciency in a fundamental sense. The main contributions of this work are as follows.

A novel incremental approach to exact placement optimization is presented. The approach gives signicantly better asymptotic computational complexity results for a single placement computation iteration within a simulated annealing environment [9]. A theoretical analysis is given supported by experimental results. A new consistent linear-time algorithm is given for mapping a given placement of modules in a user-specied region to an efcient formal representation. The algorithm is, for instance, useful for converting graphical (user-interface) data to an abstract format which can be further processed by means of efcient algorithms. The algorithm can also be utilized for applying hierarchical notions to a given placement. An improved robust placement algorithm is given which can incorporate range and boundary constraints imposed on specic modules in an efcient manner. Experimental results are given to illustrate this. A framework has been established which incorporates placement and global routing. Within this framework it is relatively easy to incorporate physical problems related to the spatial distribution of objects in a plane [10]. This type of consideration is becoming increasingly more important for contemporary mixed-signal designs. For example, substrate coupling in high-performance mixed-signal ICs [11] is causing serious circuit performance degradation, and surface gradients in high-speed digital-to-analog converter designs cause serious spectral performance losses [12]. We note that the overhead in computational complexity is mostly in constant factors due to an efcient combination of advanced data structures. We establish new results on very fast Steiner minimal tree (SMT) approximation algorithms in combination with efcient dynamic routing graph models. The novelty

1.6 Main Contributions of this Work

is mainly due to the combination of an efcient sparse routing graph model and improved shortest-paths-based heuristics. We compute heuristic routing results on a broad set of placement-derived problem instances and net sizes, which we compare to optimal routing solutions. These optimal solutions are obtained using state-of-the-art third party tools. The differences in solution quality are small (0% to 5%), which implies that more expensive SMT heuristics can only improve marginally at the cost of a (signicant) increase in execution time. It turns out that due to the difculty of some of the routing problem instances, not all optimal solutions are computable [13]. It is the rst time that, in the context of accurate placement and global routing, for routing problems derived from practical placements, optimal results are computed.

Extensive experimental evaluation of the proposed placement and routing algorithms have led to new results which compare favorable with existing state-of-the-art results. Furthermore, based on experiments, we expose discrepancies in most current packingcentric works that use inadequate routing schemes.

Introduction

Chapter 2

Problem Denition
This chapter describes the problem which is attacked in this thesis in more detail. We show explicitly where the problem of layout generation is located within the overall VLSI design cycle. Then we zoom in on the layout problem and show that it is a non-trivial problem to solve. In order to solve the problem adequately, rst it has to be dened in an accurate way. One part of problem denition entails proper modeling of physical entities. The other part is formulation of given real-lifespecications into simpler specications that can be handled properly at an algorithmic level. In principle, the layout problem can be split in two, strongly coupled, parts. One part is the placement or oorplanning problem, the other part is the routing problem. A typical layout problem could be stated as follows. Given a set of geometric objects to be placed in a two-dimensional plane, place these objects in such a way that a certain cost function is minimal. A standard textbook on physical design automation will take for the cost function the total length of all interconnecting wires. The catch is that in order to compute or estimate the length of a wire, placement information is needed. But, routing information is needed to compute a placement! This loop can, for instance, be broken by computing a placement based on the amount of interconnections between the blocks; blocks with many interconnections should be placed closer together than blocks with less interconnections. In textbooks, the approach is called the min-cut problem. It is an approximation to the layout problem, in that it does not cope with wire length but with number of interconnections. For more information on the layout problem the reader is referred to, e.g., [14, 15].

2.1 Top-Down Flow and Bottom-Up Approach


In order to adequately solve the layout problem, it is necessary to divide this complex problem into less complex and conceptually easier to handle subproblems such as the placement problem and the routing problem. Moreover, we have to dene the required information to solve the layout problem more precisely. Hence, the path from electrical circuit description, process technology data, and system specications, to the layout system is made explicit. Generally, we denote the ow of operations, from a high level to lower levels, leading to renements at each step, a top-down ow. The information is essentially pushed into one direction, with or without some feedback path, to arrive at a desired target. The need for this essentially hierarchical approach is obvious when the initial problem is too difcult to assess at once.

10

Problem Denition

In cases where we precisely know the impact of a certain higher-level decision on a lower level, a top-down approach is very convenient. However, when problems become more complicated and interdependencies start playing an important role, it is almost impossible to accomplish the task in an adequate way with solely top-down information. As the amount of information from a lower level starts getting increasingly more important on a higher level, we speak of a bottom-up ow. We make plausible that, indeed, a bottom-up approach naturally applies to layout generation. During the top-down ow of the design cycle, we arrive at the point where information needs to be supplied to the layout generation system. Hence, the interface of the layout system must be dened explicitly, facilitating communication of relevant information to the layout system. As a consequence, the layout system itself can operate more efciently and consistently generate predictable results as a function of several input parameters which will be described shortly.

2.1.1 A VLSI Design Cycle


The fundamental process steps in a VLSI design cycle are shown in Figure 2.1. The four blocks on the left-hand side represent processes that start from a very high level of an idea. Then specications are dened in the next block. The architectural design process deals with functional blocks at a high level for which (conceptual) realizations exist, and their intercommunication. Finally at the lowest conceptual level, the functional blocks must be transformed to building blocks which are used in circuit design. This is the top-down part of the cycle. The result of the top-down ow is an electrical circuit. This is the basis of the bottom-up ow. The last four blocks of the cycle consist of the generation of building block layout modules at the lowest level, going up to placement and routing of these layout
Idea Formulation Behavioral Representation Testing Prototype Specification Manufacturing Circuit Design Module Generation Circuit Technological Representation Physical Representation Architecture Design Structural Representation Placement/Routing

Figure 2.1: A general design cycle showing important aspects of VLSI design.

2.1 Top-Down Flow and Bottom-Up Approach

11

modules. After placement and routing, the layout can be manufactured and nally tested to see if its functionality and performance complies with the original idea and its specications. Although the overall ow of information is top-down, the last four blocks in the diagram are drawn bottom-up. The intention is to make clear that the physical design part requires an essentially bottom-up approach. The reason for this is that a mixed-signal design has both digital and analog components and typically these components are highly interconnected. It is this class of designs that is impacted most severely by parasitic effects such as substrate coupling, delay, mismatch, etc. As such, it is sheer impossible to decouple placement and routing while targeting high quality. Thus, predicting the result of placement and routing is at least as difcult. The outer ow in the gure states what type of information is exhibited at a certain stage in the design cycle. At the highest level the behavioral representation is eminent. After that, the structural representation becomes important, in which more precise information is given on what functions are performed where. At the technological representation level, the implementation aspects come into play. It species what type of circuit elements are used, their properties, and so on. The physical representation level comprises of everything that is directly related to the layout of the circuit on the wafer. Finally, a prototype IC is available. Note that the direction of the arrows only indicates the ow of the processes in time for each part of the overall design, not the interdependency of the processes. For example, in order to perform adequate placement and routing, information is needed on certain specications. Furthermore, in the architectural design, testing facilities should be taken into account. In short, strong interrelationships exist between almost all of the VLSI process steps. Hence, it is impossible to regard a specic process step without taking notice of the other steps. On the other hand, including many process steps in an attempt to nd a universal layout methodology will be too idealistic because of the intrinsic problem complexities involved. A way to solve this dilemma, is to dene an interface from each block to the other blocks and specify exactly what is input and what is output, and nd a methodology that will provide high-quality layouts within the conned framework. This is the well-known top-down approach.

2.1.2 Physical Design


The focus of this thesis is on several aspects of the physical design step; more specically the placement and routing phases. Physical design is the last step in the design cycle where a designer can exercise his or her inuence on the nal performance of an integrated circuit before it is xed onto silicon. In Figure 2.1 the part of the VLSI design cycle that will be focused on in this work, is shown in the shaded area: the circuit, module generation, and placement/routing. The physical design step in itself can also be seen as an iteration loop. In order to limit the complexity due to interdependencies, we presume that the given circuit, which is one of the inputs of the physical design step, is our nominal reference. As a consequence, we do not attempt to improve or to alter the behavior of our reference circuit; our goal is to prevent deterioration of system performance as much as possible due to undesired but unavoidable implementation phenomena such as crosstalk, wire delays, surface gradients, etc. Figure 2.2 gives a classical ow diagram which represents the physical design. As can be seen from the diagram, the input of the physical design phase consists of the circuit netlist, circuit specications, and technology data. By means of module generation, the basic objects

12

Problem Denition

are conceived for the placement and routing phase, which are the core problems of physical design. The initial layout needs to be checked for design rules compliance. After that, an extraction of the layout needs to be performed. The extracted information is an annotated netlist including all parasitic elements which are not or only partially accounted for in the circuit netlist (schematic). This annotated netlist is compared with the original netlist to see if any discrepancies have been introduced, apart from the parasitics. Using the annotated netlist, circuit simulations are performed, typically with a Spice-like simulation tool. If all is well, and the specications are complied with, the nal layout is ready to be fabricated. If something is wrong, a change in the placement/routing is required and the loop is repeated until the layout is acceptable. However, it may turn out that the layout system cannot nd
specifications circuit netlist technology

module generation & placement/routing

layout

placement/routing modification

design rule checking

technology design rules

no

correct? yes parasitic extraction technology electrical parameters & models

annotated netlist

no

netlist correct? yes simulation

simulated specifications

no

specs met? yes final layout

Figure 2.2: A classical ow of the physical design step. a satisfactory solution (even if the system would be ideal). In such cases there is an escape route via the dotted arrows to adjust, for example, the specications, or transistor models which are used by the simulator.

2.1 Top-Down Flow and Bottom-Up Approach

13

2.1.3 Mixed-Signal Layout Styles


Several layout styles are available for implementing mixed-signal and analog integrated circuits. The differences in these styles are typically density, performance, exibility, and timeto-market, where one characteristic is usually traded off against another. There is a broad variety of different layout styles. It is difcult to classify them from the point of layout exibility, i.e. the degrees of freedom a designer has to make layout decisions. Hereafter follows a brief overview of a few common layout styles. For more information, the interested reader is referred to [16]. Full Custom In a full-custom layout every component in the design is hand-crafted with the ultimate tradeoff between performance, area, and power, which often results in highly irregular placement and routing. Typically, no restrictions are imposed on the width, height, aspect ratio, or terminal positions of the layout blocks. Furthermore, each block is allowed to be placed at any location on the chip surface without restrictions. Of course, design rules have to be taken into account at all times. Obviously, this technique has the largest exibility, the best performance, and a very high integration density, since the layout can be optimized and tuned for each specic application. A major drawback of the full-custom layout technique is that it is immensely labor-intensive, resulting in large turnaround times and thus a large time-tomarket. In addition, the tools that (partially) support the designer in creating a high-quality layout are very complex (although their main task is to limit design complexity) and can only lead to a good layout with the aid of an expert. Standard Cell In order to overcome the drawbacks of the full-custom layout style, mainly due to complexity, several methods have been proposed to mitigate the overwhelming effect of complexity combined with full design freedom. This is essentially accomplished by putting restrictions and constraints on the physical design of the circuits. Standard cell layout is a common layout technique which was rst introduced in the digital VLSI domain. It is featured by the use of a standard library of prefabricated cells with different functionalities. The standard cell (a layout block) is restricted to a xed height and has variable width. All cells are placed in a number of rows. A certain amount of space between two rows, also called a channel, is reserved for routing. Thus, placement and routing have become (conceptually) simpler. Field-Programmable Gate Array (FPGA) The essence of an FPGA consists of a xed number of functional (but primitive) building blocks distributed on a chip, where the actual interconnections are dened via electrically programmable switches. FPGAs cannot be used for higher frequencies because of inferior routing and additional parasitics. Sea of Gates The sea-of-gates design style is comparable with FPGA; all layout blocks are predened on chip and the designer only has to dene the interconnect. Unlike FPGA, no switches are

14

Problem Denition

used to dene the routing. Instead, the interconnect is dened by a separate process step. Therefore, sea-of-gates design cannot be used easily for rapid in-house prototyping such as FPGA. Another noticeable difference between FPGA and sea-of-gates is that the latter has a very ne grain size compared with the former. Typically, an FPGA primitive cell consists of a multiple-transistor circuit, whereas a sea-of-gates primitive cell is a single transistor.

2.1.4 From Circuit to Layout


Figure 2.3 shows the top-down ow of information from a higher circuit-level description (schematic level) to a lower-level description of the same information incorporating implementation details (module level). Along the way, more implementation level details are incorporated using process technology data, with higher-level specications guiding intermediate decisions that have to be taken. More specically, at each level the higher-level specications are translated to specications which are meaningful for that specic level. In the diagram, we can speak of high-level (overall circuit) specications, intermediate-level (subcircuit) specications, and low-level (layout module) specications. We assume that such a translation is always possible, although it might be difcult. Actually, this translation problem is discussed shortly, under the umbrella of the mapping problem. At the schematic level, the relevant specications are the resistance value if the module is a resistor, a capacitance value if the module is a capacitor, a width over length value if the module is a CMOS transistor, etc. At the subcircuit level, for instance, matching between circuit elements is taken into consideration. Thus, matched elements must be grouped into circuit modules. At the layout module level, important specications are drain/source capacitance, gate impedance, module size, etc. As can be seen, the technology data has impact on each level of abstraction. Thick arrows indicate a major translation effort (in order to obtain a high quality layout) as compared to thin arrows. This translation effort falls under the umbrella of the mapping problem. At this moment, viewing the layout system as a black box with certain inputs and outputs is convenient. The exact contents will be specied by decisions which are to be taken later on, based on the information given in this chapter. The layout system interface takes as input

the process technology data: design rules, via resistance and capacitance, metal sheet resistances, substrate resistance, guard ring constructions, etc.; information on pins: the allowable range of resistance ( , ), capacitance ( , ) , inductance seen at the output of a pin, the range of current amplitudes; information on modules: height ( ) and width ( ) of each module, the exact position of each pin connected to a module, the nets connected to a module, the sensitivity or noisiness of a module, etc.; a cost function: the parameters that need to be optimized in the layout, importance of certain parameters over others, constraints on specic module positions, constraints on the total layout size or aspect ratio, etc.

A pin is located at the perimeter of a layout module, and forms a gateway to the outside world as seen from the module. Furthermore, a (layout) module is assumed to be rectangular.

2.1 Top-Down Flow and Bottom-Up Approach

15

technology overall circuit specs

higherlevel input

circuit circuit schematic (netlist)

schematic circuit subcircuit specs module subcircuit schematic level

layout layout module specs

module layout module module level

technology

pin info

module info


etc.

pin positions net info, etc.

cost function parameters

layout system interface

Figure 2.3: Top-down ow of a physical design process. The output of the layout system is a layout which is essentially similar to the high-level overall reference circuit. After all parasitic elements have been added to the reference circuit schematic, a simulation should show that the layout complies to all specications. It is important to note that although issues such as yield and reliability are not taken into account, the layout system should not preclude the integration of these important matters. Therefore, generality of the layout system is a concern throughout this work.

2.1.5 Layout System Requirements


Since a practically feasible layout system is our ultimate goal, some requirements for ensuring this have to be set:

computational efciency must be high, to allow for scalability;

16

Problem Denition

generality must be high, to allow for easy incorporation of models of performance degradation; robustness must be high, to produce consistently good and predictable solutions.

2.2 The Mapping Problem


The layout problem has been dened in terms of a magic black box system that, given specic inputs, has to produce a high-quality layout as output. There is a noticeable difference between the input parameters of the layout system interface and the high-level specications and circuit descriptions. This discrepancy between the real-world high-level specications and the interface specications has to be resolved. In other words, we have have to solve the mapping problem which is dened as follows. Problem: The mapping problem Instance: Solutions: Minimize: A set of high-level overall circuit specications, and a set of desired layout system interface inputs. All mappings that translate the high-level specications into the layout system inputs. The sensitivity of high-level specications to layout system input parameters while adhering to all specications, so that the best possible layout can be output.

Note that minimizing sensitivity is similar to maximizing exibility in parameter value range in practical circumstances. Implicitly, this mapping is shown in Figure 2.3. Generally it is not trivial to perform this mapping. Hereafter, the mapping problem is discussed to make the reader aware of this problem, but no solution is proposed in this thesis.

2.2.1 High-Level Specications


High-level specications dene and quantify the functionality and the quality of an overall circuit in typical terminology. Depending on the type of circuit under consideration, different terminology is used. Table 2.1 shows a few examples of circuits and their associated highlevel specications. The relationship between the high-level specications and the functionality of a circuit is clear. A direct consequence of this is a broad diversity of quality measures.

2.2.2 Layout System Specications


The most striking difference of the low-level specications in terms of our layout system interface, as compared to the high-level specications, is the apparent independency of the measures as a function of the circuit type. Actually, the dependencies are hidden in the pin info and module info inputs of the interface. Recall that the pin information holds, among others, the allowable range of load capacitance, resistance, etc. Furthermore, the module information holds, among others, the locations of the pins at the perimeter of the module, the sensitivity of the module, etc. A direct consequence of this observation is the problem of how high-level specications are mapped to layout system specications.

2.2 The Mapping Problem

17

Table 2.1: Examples of high-level circuit specications. system type typical specications analog lter bandwidth quality factor D/A converter integral nonlinearity differential nonlinearity spurious-free dynamic range digital decoder maximal propagation delay fan-out clock frequency logic functionality

2.2.3 Constraint Mapping Problem


A specication is also called a constraint. Under the general umbrella of constraint management and transformations, the (constraint) mapping problem has been investigated by numerous researchers. However, all known works, with mathematical justications, have been restricted to purely linear analog systems [17]. The reason for this is quite obvious; computations with linear systems and their properties are much simpler than with nonlinear circuits. Although some authors have claimed that the linear approach can also handle nonlinear systems by using linearization techniques, this only holds true when operating in the close vicinity of a certain static biasing point. For typical mixed-signal designs such as A/D and D/A converters, in which large signal transitions occur and biasing points are denitely not static, linearization is inappropriate. As a consequence, heuristics have been used to transform high-level constraints to low-level constraints [18]. A major disadvantage of the use of heuristics is that it might ignore/oversee problems which also need to be considered. Also, it is hard to quantify the quality of obtained results in terms of high-level specications, even in case of full compliance to heuristic rules.

2.2.4 High-Level Sensitivities


Due to the fact that a (large) set of tunable parameters is available in the mapping problem, sensitivities are needed to obtain a proper set of parameters. This set of parameters should be representative for the robustness of the circuit. Generally, the high-level sensitivities as a function of high-level parameters are strongly nonlinear with strong interdependencies. Consequently, proper transformation of parameters, for instance to a lower-level parameter, with awareness of sensitivities is a daunting task. When a circuit is designed at a high level of abstraction, a good designer knows that the actual component values and properties he or she had in mind when conceiving the design, is highly likely not the exact achieved value in a prototype. Thus, in order to mitigate the effect of deviations from the nominal (desired) values, it is necessary to have an idea of circuit sensitivities. Usually these sensitivities are not specied, but it is a well-known fact that a consumer-ready design must be robust against process deviations. In Table 2.2 a few examples are given of high-level sensitivities for an analog lter, a D/A converter, and a digital decoder. As can be seen, the high-level sensitivities of a circuit are directly related to the functionality of it, resulting in a diversity of sensitivity measures.

18

Problem Denition

Table 2.2: Examples of high-level sensitivities. system type typical high-level sensitivity analog lter robustness of transfer function to variations of passive components in the circuit D/A converter clock jitter effect on spurious-free dynamic range digital decoder variations in signal delay relative to output load

2.2.5 Lower Level Sensitivities


When high-level specications are transformed to lower level specications, a desirable property is that the high-level specications are adhered to if the lower-level specications are adhered to. Furthermore, it is undesirable that a small change in a lower level parameter causes a large change in a higher level parameter. Therefore, the sensitivity of high-level parameters needs to be minimal with respect to lower-level parameters. We denote this relationship by low(er)-level sensitivity. An example of a low-level sensitivity (at the subcircuit level) is sensitivity of integral nonlinearity with respect to matching of transistors in a specic differential pair. Generalizing, every sensitivity can be expressed as the level of dependency of a high-level specication on a lower level parameter. Formally this is written as

(2.1)

where is some kind of performance measure and is a lower-level parameter. If high-level sensitivities are known, (2.1) can also be computed using

(2.2)

where is a high-level parameter (such as clock jitter). Layout system sensitivities are (implicitly) represented by the range of allowable values for each pin parameter: , etc. Also modules have low-level sensitivity measures associated with them. For instance, module noisiness and module sensitivity are two module parameters that are useful for minimizing the detrimental effect of substrate coupling. The former quanties the capability of performance degradation that can be inicted on neighboring modules. The latter quanties the vulnerability of a certain performance measure to substrate noise.

2.2.6 Sensitivity Computation Problem


As discussed previously, in connection with the mapping problem, sensitivities play an important role; the accuracy of a specication and the ability to adhere to that specication with high probability, is signicantly inuenced by low-level sensitivities as dened by (2.1). The exact computation of (relevant) sensitivities falls outside the scope of this work. Nonetheless, we point out that sensitivity computation is essential to successful practical layout generation. In order to show the effectiveness of certain concepts, we use randomly generated sensitivity values. These values are not related to a physical property, but merely serve to show the strength of a methodology. Consequently, the sensitivity values should be interpreted in a fuzzy sense with a relative character.

2.3 Placement and Routing Constraints

19

2.3 Placement and Routing Constraints


Classical constraints on placement and routing are smallest possible chip area and minimal wire length. In the context of mixed-signal layout generation this is clearly not sufcient. Therefore, additional constraints are needed such as crosstalk-aware routing constraints, substrate-aware placement constraints, matching-aware placement constraints, and so on. In this thesis, we attempt to develop a general-purpose framework which allows for the incorporation of these additional constraints in a straightforward way. Although not all concepts have been actually worked out down to the implementation level, the awareness of these constraints is paramount in our approach. From another standpoint, we are forced to put additional constraints on the placement and routing approach. This is mainly a direct consequence of computational efciency considerations. From this point of view, a very important placement-related constraint is the fact that overlap of modules should be avoided. An argument to allow for overlap is that it could possibly merge source/drain connections of a pair of transistors, leading to a reduction in source/drain capacitances [8]. Overlap is, however, detrimental to system performance in two ways. First, overlap typically is undesirable because it generally leads to design rule errors. Hence, the evaluation of overlapping placements is a costly waste of computation time. Second, it is not possible to make accurate estimations with respect to, for example, substrate coupling and wire length for an illegal placement with overlap. We note that allowing general overlap is not a good technique to obtain effective (and efcient) merges, as desired merges can be handled within modules a priori. For instance, candidate transistor pairs for merging can be identied beforehand and put into a single module in advance. This approach does not only solve the issue of overlap, but also reduces the size of the placement solution space. Since we allow a module to consist of sub-circuits, the number and positions of the pins at the circumference of a module should in principle be unrestricted. The constraints we put on routing are as follows.

We do not allow over-the-cell-routing. Although more than two metal layers are typically available for routing in modern process technologies, the exploitation of the lowest metal layers can benet: the reduction of routing problems at higher layers, the reduction of yield-decreasing vias, the avoidance of unpredictable interaction with intellectual property (IP) blocks. Each pin-interconnecting network should have minimal length. For reasons of simplicity, but without being too restrictive, we assume that this is an optimal way to connect the pins of a net. Unfortunately, this apparently simple problem (at least for a small number of pins), is a very hard problem which is better known as the Steiner minimal tree problem [19, 20].

We justify these restrictions using the following arguments. In mixed-signal layout design the effective use of space around modules in the lowest metal layer decreases the unwanted coupling between, for instance, polysilicon and metal considerably. Moreover, any created coupling can be controlled much more tightly. The oupling from higher metal layers to the bottom layers is signicantly smaller, which justies the use of higher metal layers for overthe-cell routing. The minimal-length metric is less restrictive than it seems since it does not imply a geometric metric. In fact, a very broad class of interconnection networks can be covered by

20

Problem Denition

dening (sophisticated) weight functions for the branches in the network. These weight functions typically depend on physical properties of each branch, e.g. the voltage/current variation and magnitude, or the physical location of each branch. The latter accomodates (parasitic) interaction of this branch with neighboring obstacles. Besides the fact that area and wire-length constraints are very important, they are denitely not the sole constraints relevant to placement and routing. Especially in the context of mixed-signal layout generation, we must rene this set of primary constraints and additionally include, or allow for inclusion of performance-related constraints such as substrate coupling impact minimization, crosstalk minimization, optimal matching, etc. The proposed framework should be able to incorporate the overall set of constraints in an efcient manner.

Chapter 3

Optimization Methods
In this chapter a variety of well-known VLSI optimization methods is described. As pointed out in Chapter 2, there are many constraints involved in mixed-signal layout generation which makes this task intrinsically difcult to solve properly. Moreover, due to the many types of constraints that are involved, the type of optimization algorithm which is used to generate a layout can have a signicant inuence on the nal result, both in quality and in computation time. Naturally, each type of optimization framework has its cons and pros. The points that are regarded important in our task are:

easy handling of a heterogeneous mixture of constraints, efcient placement and routing representations, efcient computation of placement and routing solutions, practical achievability of near-optimal results, low implementation complexity.

First an overview of existing approaches to successful VLSI optimization is given. Then one of the approaches is selected, based on the previously described criteria, and used for our optimization framework. It should be noted that most of the described optimization methods have been shown to work well on a given set of problems. Conversely, it is a known fact that an optimization method that performs well on a certain class of problems, might perform poorly on another class of problems, with or without tuning. Thus, generalizing results to related or modied problems should be done with utmost caution. We attempt to place the shortly presented methods under the same uniform umbrella of placement and routing. However, only some of these methods have properties which are suitable for general placement and routing, taking into account the previously mentioned important points. We elaborate on one of the most promising methods which is known as the simulated annealing algorithm.

3.1 VLSI Optimization Methods


The layout generation problem is inherently very difcult to solve. Even when split in several sub-problems it remains difcult to solve. Generally, all non-trivial problem instances

22

Optimization Methods

are intractable, i.e. it requires an excessive amount of time to solve a problem instance to optimality when the instance size is increased. In other words, the problems are NP-hard [21]. Nonetheless, in practice the layout generation problem is split into a placement and a routing phase. The last one may again be split in a global routing and a detailed routing phase. As a direct consequence of the NP-hardness of layout generation, we have to resort to heuristic or approximation methods that yield an acceptable solution within reasonable time. The following classication might not be optimal, but it is one that matches well with contemporary ideas. Furthermore, it provides a good impression of the vast body of research activities in this eld. An extensive overview can be found in [14]. A very recent, and more mathematically avored comprehensive overview is contained in [22].

3.1.1 Deterministic Algorithms


A deterministic algorithm is a recipe that describes which steps have to be taken sequentially, in order to transform a set of input values to a set of output values. For such an algorithm no random number generator is needed to execute and nd a solution. These type of algorithms are typically used in a graph representation of a problem. Typical properties of deterministic algorithms are:

sub-optimality of the solution, high execution speed, the same solution is found, each time the algorithm is run.

As deterministic algorithms were the rst type of algorithms to see the light, the number of such algorithms is very large. Only a few deterministic algorithms will be mentioned here. Problem-dependent Methods

Rule-based algorithms In this approach, expert knowledge is translated into rules which are used by the system to generate a proper layout. Clearly, the quality of the rules is of paramount importance. Furthermore, the set of rules should be adapted to accomodate for new types of circuits and layout techniques that are introduced. As a consequence, maintaining a good set of rules is labor intensive. A fundamental problem in connection with a rule-based approach is the difculty of dening general and context-independent rules. Template-based algorithms As the name implies, templates are used as a starting point, guided by specic values of input parameters, to transform a certain template to a proper layout. The creation of the templates is a knowledge-intensive task, which is one of the main bottlenecks of this approach. Moreover, the set of obtainable layouts is limited to the set of available templates and their combinations.

Problem-independent Methods

Linear programming algorithms A linear programming algorithm describes the problem as an constraint matrix

3.1 VLSI Optimization Methods

23

, an -vector , and a cost vector . A solution of a linear problem is then one that satises the linear constraints and , while minimizing .

Divide-and-conquer algorithms Divide-and-conquer algorithms partition the problem into more or less independent subproblems, solve the subproblems recursively, and then combine their solutions to solve the original problem. Dynamic programming algorithms Dynamic programming, like the divide-and-conquer method, solves problems by combining the solutions to subproblems. Programming in this context refers to a tabular method, not to writing computer code. In contrast to divide-and-conquer algorithms, dynamic programming is applicable when the subproblems are not independent, that is, when subproblems share subsubproblems. In this respect, a divide-and-conquer algorithm does more work than necessary by solving common subsubproblems more than once. The latter is avoided by dynamic programming through the use of a table in which each solution to a solved subsubproblem is stored, saving a signicant amount of computation time. Branch-and-bound algorithms The branch-and-bound method is an exact method that can be applied to a broad class of problems. All that is required is a tree-structured conguration space and an efcient way of computing tight lower bounds on the cost of all solutions containing a given partial solution. Typically this method can only be applied successfully to small problem instances, but with clever pruning techniques a larger solvable range can be reached.

3.1.2 Stochastic Algorithms


Stochastic algorithms have been introduced to circumvent most problems of deterministic algorithms. There is a fundamental difference between inherently stochastic algorithms and stochastic versions of deterministic algorithms. The former type of algorithms is usually inspired by natural phenomena. The latter is a randomized extension of a deterministic algorithm in order to improve worst-case performance and facilitate algorithm analysis. Although most inherently stochastic (or probabilistic) algorithms are based on elegant theories with very desirable properties with regard to their ability to nd a globally optimal solution, practical constraints limit the usefulness of these theories. The most striking , where the time variable apexample is a commonly used mathematical operation proaches innity. When convergence is slow, the required time for nding an optimal solution may become prohibitively large thus rendering the algorithm practically worthless. Nonetheless, an algorithm in this class can be useful when a near-optimal solution is good enough and such a solution can be obtained within a reasonable amount of time. By truncating an unlimited time interval to a limited interval, theoretical properties are normally invalidated. Hence, no guarantee can be given on how close an obtained solution lies to an optimal solution. Techniques to improve speed, average solution quality, or any other desirable property which is not supported by mathematical evidence, turn any algorithm into a heuristic algorithm.

24

Optimization Methods

Two well-known stochastic algorithms are simulated evolution [23, 24] and simulated annealing [25]. Lately, there has been an increased interest in so-called memetic algorithms, a concept which sprouted from the mind of Dawkins [26]. Memetic algorithms are a generalization of genetic algorithms in which the human mind plays a crucial role; cultural inuences have a signicant inuence on the survival capability of a certain species, in conjuction with specic natural genetic properties.

3.1.3 Heuristic Algorithms


Heuristic algorithms or simply heuristics, belong to a popular class of algorithms in which intuitive ideas or promising tricks are employed to search for good solutions of NP-hard problems. Also, (partial) randomization might be used to achieve this. Generally, there is no way of nding out how close we have come, in absolute measures, to a global optimum and with what probability. Even though some solutions that are obtained for given problem instances might be quite good, no guarantee can be given on how good the heuristic will perform on another problem instance. The heuristic approach is by far the most widespread method in practice today. All of the iterative improvement techniques (in the context of NPhard problems), both deterministic and stochastic, fall in this category. Note that even an algorithm such as simulated annealing, which will nd an optimal solution in theory, turns into a heuristic due to practical constraints such as limited time. The last observation implies that optimization results produced by heuristics, should be evaluated with statistical means in order to enable fair comparison. A true drawback of most heuristics is reproducibility of solutions, so that independently obtained results can be veried or compared. Due to the sensitivity of the results of heuristic approaches to the optimization environment, it is difcult in practice to make reliable comparisons (even in a statistical sense). Most popular stochastic heuristic approaches are based on either the genetic algorithm, or the simulated annealing algorithm. The genetic algorithm or simulated evolution algorithm [24, 27] is suitable for a wide variety of optimization problems. It represents an articial simulation of the biological evolution of species, as conceived by Charles Darwin [23]. The optimization problem is described as a set of candidate solutions (the search space) and an object function that has to be maximized. In analogy with biological systems, the object function is called tness function. Each solution is associated with a certain tness value. In order to nd a solution with highest tness, the genetic algorithm produces a sequence of populations of candidate solutions. The generation of each successive population is a random process, guided by the tness of the members of the previous population. Typical biological phenomena are simulated during population generation. The most important ones are: selection, mutation, and cross-over. The simulated annealing (SA) algorithm is by far the most used stochastic optimization algorithm in contemporary literature in connection with layout generation. The reasons for its success are mainly: simplicity of implementation1, incentives induced by results of previous approaches, its exibility with respect to the type of problems and constraints it can handle, and last but not least, the conceptual similarity between the problem description and a (straightforward) equivalent formulation in terms of simulated annealing entities. As will be
1 The concept is very simple to implement, but, admittedly, a robust implementation requires a large amount of effort.

3.2 Simulated Annealing

25

shown in Chapter 6, the SA algorithm is a very promising candidate for the layout generation problem. A separate section is dedicated to discussing general features of it.

3.2 Simulated Annealing


Simulated annealing (SA) is an algorithmic approach based on the thermodynamical annealing process. If a hot bath of crystalline material is lowered slowly enough in temperature, i.e. annealed, a perfect crystalline lattice is obtained. A perfect lattice with minimal stress is associated with a global minimal-energy optimum. Conceptually, the computational approach is as follows. The initial temperature is chosen high enough so that the end result will be independent of the initial state. Then the temperature is lowered slowly according to a specic cooling schedule. The cooling must be performed slowly enough, so as thermal equilibrium can be reached at each temperature. Small perturbations are generated and applied to each state, causing the system to jump to another state with a different associated energy. If the energy of the new state is smaller than the energy of the current state, then the new state is accepted. However, if the new state has a higher associated energy, it is accepted with a certain probability. This acceptance probability depends on the energy difference of the states and the current temperature of the system. This procedure is iterated until the temperature is low enough and the system is said to be frozen. An extensive body of research on simulated annealing or more generally, nonlinear stochastic global optimization is active in several scientic elds. To name just a few: statistical mechanics, computational biology, computer science, mathematics, electrical engineering, operations research, etc.

3.2.1 Basic SA Algorithm


The basic simulated annealing algorithm is shown in Fig. 3.1. A quick skim through the algorithm directly reveils its simplicity and its generality. Each problem is represented by a set of states. The number of states can be innite in theory, but due to the niteness of computer representations the set is nite in practice. However, this is not a limitation. Essentially, the algorithm starts from a random initial state and applies a perturbation to this state to nd a new state on line 6. On line 7 the cost associated with state is computed. Then, on line 8, the acceptance test is performed. If the cost of the new state is lower2 than the cost of state then state is unconditionally accepted. On the other hand, if the cost of state is higher than the cost of state then state is accepted with probability

is accepted

(3.1)

where and are the costs associated with states and , respectively, and is the temperature of the system. The function of the temperature is as follows. When is large (compared to a typical cost difference), the right-hand-side of (3.1) is close to one. This implies that at high temperatures, cost increments are almost always accepted. When the temperature is decreased gradually, the impact of the cost difference will get more pronounced.
2 The

lower the cost, the better.

26
Input: solution state space and all optimization parameters Output: (near-)optimal state
1 random initial state 2 3 4 while 5 do 6 generate 7 compute cost 8 if random 9 then 10 11 12 13 if 14 then 15 16 17 18 decrease temperature 19 od 20 return

Optimization Methods

Figure 3.1: A basic simulated annealing algorithm. Consequently, at low temperatures the right-hand-side of (3.1) will be close to zero for a typical cost increment. Thus, the probability of accepting the corresponding will be very small. On lines 13-17, the best state and its associated cost are stored for later reference. The temperature is lowered on line 18. Although the basic simulated annealing algorithm is simple in appearance, and has good practical performance, its internals are not well understood. Simulated annealing has attracted much attention because it treats every problem as a black box. Therefore a very large class of problems can be solved using SA. A few examples are: combinatioral problems [28], function optimization problems [29], and neural network optimization and training [30, 31]. Many adaptations of and extensions to the classical SA algorithm are known. The existing literature on this topic is too extensive to cover here. We only mention a few interesting concepts and approaches. In [32] Boese and Kahng observe that under nite-time conditions, the classical monotonically decreasing temperature schedule is not optimal when the best solution seen so far is the output of the SA algorithm, as opposed to the last solution seen (that is accepted). A recipe is given to derive a (near) optimal best-so-far temperature schedule. In [33] Cong et al. propose to use a dynamic weighting Monte Carlo approach for oorplanning; they obtain promising results. The essential difference in their approach is an SA algorithm with a stochastic temperature schedule. In a general fashion, we can state that function decrease temperature should be replaced by adjust temperature in order to maximize the power of SA. The generality of SA comes at the cost of a large amount of computational resources that are required for practical problems. There are two ways to reduce the amount of computational resources. The rst one is to minimize the number of iterations. This can be accomplished in various ways; by choosing a better representation of the problem, by modi-

3.2 Simulated Annealing

27

fying the cooling schedule, by choosing a better generation function, by nding more suitable perturbation operators, etc. Also a mixture of the aforementioned items is not unimaginable. Actually, it is not known how to minimize the number of iterations in an optimal way. Most approaches rely on intuitive notions. Altogether, a practical SA implementation is truly heuristic in nature. The second way is to reduce the computations within a single SA iteration to a minimum. This approach is taken in this thesis. To state this more clearly in a more abstract way: the computational complexity of a single SA iteration is taken as the performance measure. For more information on computational complexity matters, the reader is referred to [34]. A point worth noting is the fact that for practically all non-trivial problems, computing the cost associated with a new state is the most time consuming task. Therefore, it is of interest to investigate this part of the algorithm. The key ingredients for an SA algorithm are discussed next.

3.2.2 Problem Representation


In principle, every optimization problem can be formulated in terms of a (discrete or continuous) mathematical function . Solving the problem is then equivalent to minimizing as a function of its argument . For most real-life problems, is multidimensional and evaluating for a given might not be straightforward. Exactly how difcult it is to evaluate , strongly depends on the problem representation; every representation gives a different . Figure 3.2 gives a visual idea of how different representations give different functions (and solutions). The representation associated with function has a most irregular landscape. The global mimimum of is equal to , the optimal solution of the original problem. A

optimal solution

Figure 3.2: The inuence of problem representation on the cost landscape. better representation, in terms of smoothness of the landscape, is given by . The global minimum of also coincides with the optimal solution . Another representation is even smoother, but its global minimum deviates from . This representation might not be a proper candidate. Whether or not this is the case depends on the amplitude of the deviation . Note that easy evaluation of does not imply that the landscape is smooth (and vice versa). Smoothness of the landscape is, among others, determined by the ordering dened on . Consequently, for different orderings of , the appearance of the landscape changes, but all

28

Optimization Methods

global minima remain intact. Furthermore, for non-trivial problem instances we usually do not know an optimal solution. As a consequence, the de-facto standard way of benchmarking is comparing with best known solutions. Note that the function is also called the cost function in a simulated annealing context, and is called a state. The global optima of the cost landscape are solely dened by the cost function. Furthermore, the shape of the cost landscape is to a large extent determined by the set of perturbation operators and the generation function dened within the simulated annealing enviroment; they dene the local optima [35]. The previous statement can be explained by observing the fact that the perturbation operators and generation function determine whether or not a given state is optimum relative to its neighbors. In order words, the perturbation operators and generation function determine the neighbors of a state and thus an ordering on , and consequently dene all non-global local optima. Both of these ingredients will be discussed shortly.

3.2.3 Perturbation Operators


Simulated annealing is a sampling algorithm. It will (statistically) go into the direction with best samples. Perturbation operators are needed in order to provide a representative set of samples. Essentially, a perturbation operator changes the current solution into a new solution; the perturbed solution. A minimum set of perturbation operators is needed to guarantee reachability of every solution. Denition 1 (Complete Perturbation Set) If every state is reachable from an arbitrary initial state using a specic set of perturbation operators, then this set of perturbation operators is called complete, otherwise it is called incomplete. Thus, an important requirement for a perturbation set is completeness. However, completeness alone does not mean good convergence behavior of the simulated annealing algorithm. As pointed out by Otten and Van Ginneken [35], the global minima of the state space depend completely on the cost function, but the local minima also depend on the perturbation operators. As it is desirable to have as few and as shallow local minima as possible, a perturbation set which induces a smooth cost landscape would be ideal. Unfortunately, usually there is no way to determine just how smooth the cost landscape is going to be when a certain perturbation set is chosen. As a result, intuition and practical experience lead to a choice of perturbation operators in practical circumstances. Another requirement for the perturbation set is its computational efciency. A complicated perturbation can severely impact the performance of a single iteration of the SA algorithm, and this in turn will increase the overall run-time of the optimization process. Thus, simple perturbations which can be implemented easily and facilitate evaluation of the cost function from one iteration to another are favoured.

3.2.4 Acceptance and Generation Functions


The acceptance function assigns a positive probability to a pair of cost values and the temperature value. In a general form it looks like (3.2)

3.2 Simulated Annealing

29

The acceptance function for standard SA is given by

(3.3)

which is also called the Metropolis criterion. This has been the standard way of dening the acceptance function since the introduction of simulated annealing for optimization [25]. The generation function generates a new state from the current state. In its simplest form, the generation of a new state is independent of the current state and the current temperature. In its most sophisticated appearance, many optimization parameters can be involved. For instance, the current state, some kind of estimation of an error gradient, the temperature, etc. [36]. It should be noted that, typically, the choice of a certain generation function feature is based on heuristic grounds. Moreover, problem-dependent tuning is required in these cases. The generation function has been subject to many modications. The exact appearance of this function is closely related to the representation (the state space) of the problem and the set of perturbation operators dened on it. It is loosely coupled with the cost function, and normally the generation function is not modied when the cost function is. An important requirement for the generation function is that it allows traversal of the entire state space. A good rule of thumb is, in a probabilistic sense, to allow traversal of the state space in a small number of steps at high temperatures, and to lessen reachability of states when the temperature decreases. Ultimately, the latter is similar to a local search strategy.

3.2.5 Temperature Schedule


The temperature schedule which determines the rate at which the system is cooled down, is a very important tuning parameter of the SA algorithm. A commonly used schedule has a decreasing exponential shape, like

(3.4)

where is a cooling constant which determines the rate of cooling, is an iteration index which can be associated with discrete time, and is an initial constant temperature. The simple temperature schedule of (3.4) ignores in principle all problem-related aspects such as the irregularity of the cost landscape and lacks solution quality awareness. As such, it is not very robust and generally it needs tuning for each problem instance in order to obtain acceptable results. In standard simulated annealing [25], the temperature can only be decreased according to a certain schedule. Other, more general, schedules exist but their general applicability is not known. It is worthwhile to note that non-monotone schedules are certainly worth investigating motivated by promising results on a few (small) problem instances [32]. The only provably good temperature schedule, as yet, is due to Hajek [37]. He proved that algorithm basic simulated annealing () is guaranteed (in a statistical sense) to nd an optimal state , i.e. one with minimal cost, when the cooling schedule has the following shape

where

(3.5)

is the iteration index and is sufciently large.

30

Optimization Methods

Another often used temperature schedule is due to Huang and Sangiovanni-Vincentelli [38]. In this scheme the temperature decrement is calculated such that the slope of the observed annealing curve follows an assumed ideal annealing curve in which the average cost of congurations decreases by an essentially constant amount measured against a scale. The derived expression is

(3.6)

where is the standard deviation of the cost seen at temperature , and is a positivevalued tuning factor which modies the rate of cooling; with a typical value of .

3.2.6 Stop Criterion


Each algorithm is supposed to end, and return a nal solution upon reaching that end. For the SA algorithm, the end condition or stop criterion has been implemented in numerous ways by various researchers. The most naive method to implement a stop criterion is to take a very small positive value for . This will give in most cases an unnecessarily long running time of the SA algorithm. In some cases it might stop the algorithm prematurely while there is still a reasonable probability of nding a better solution. A more sophisticated method is to estimate the probability of solution improvement. This could, for instance, be performed by maintaining statistics on the number of generated states and the number of accepted states [36]. Also, a threshold value can be incorporated for the maximum number of iterations in a stretch in which no improvement is obtained [7].

3.2.7 Cost Function


The cost function should formulate the properties of a problem in such a way that good properties are associated with low cost and bad properties are associated with high cost. Unfortunately, it is not trivial to dene this cost function for typical VLSI problem instances. Finding a good cost function is an intuitive matter based on the penalty it imposes on the regularity of the cost landscape and the requirement of solving the right problem. The latter is a commonly neglected issue prevalent in many works [39].

3.3 Concluding Remarks


An overview is given of (global) optimization methods which are particularly well suited for a whole range of strongly nonlinear combinatorial problems. Based on the requirements for successful VLSI optimization, and reported experimental results on certain optimization approaches, simulated annealing appears to be a very promising algorithm to proceed with. Its exibility and generality ts well the heterogeneous set of constraints that is involved with layout generation. Furthermore, from an algorithmic point of view, SA is in principle easy to implement. In practice, the SA algorithm does not always return a solution that is close to optimal. Almost all implementations suffer from getting trapped in a local optimum. To which extent this unwanted phenomenon occurs, seems to depend heavily on implementation quality and

3.3 Concluding Remarks

31

the smoothness of the cost landscape; the smoother the better. Since the smoothness is determined by the cost function, the problem representation and the perturbation operators, a great amount of attention is required to choose them well.

32

Optimization Methods

Chapter 4

Optimization Approach Based on Simulated Annealing


From Chapter 3 it is clear that exact algorithms are not practical for NP-hard problems in the layout generation problem setting [40]. Thus, heuristics need to be used to tackle problems in this class. Stochastic heuristics which have been successfully applied to a broad range of problems in the VLSI domain are essentially based on the genetic algorithm or the simulated annealing algorithm, or even a combination of them [41, 42]. In specic cases, a simulated evolution approach might be preferable over simulated annealing, for example, when subcircuits, forming a subset of a pre-dened total set, need to be combined to conform with a given set of input specications [42, 43]. In such a problem setting, small changes in a solution typically induce large changes in the target function.1 The latter property is a serious concern in a simulated annealing approach because it is well known that an irregular cost landscape deteriorates convergence and the quality of the nal solution. The simulated evolution approach does not suffer much from large differences in tness function values when a certain solution undergoes changes, because a set of solutions is generated during each evolution step. The generation of several candidate solutions has an averaging effect, thus relaxing the requirement for a smooth target function. If, however, a neighborhood relationship which denes a reasonably smooth cost landscape is readily available, the simulated annealing approach is a most promising candidate for tackling layout generation problems. Also, no evidence is available in which a simulated annealing approach can be consistently outperformed by a simulated evolution approach [44]. We choose to use simulated annealing (SA) as the basis of our optimization framework. The underlying reasons for this choice are:

the conceptual simplicity of the simulated annealing algorithm, the robustness of the simulated annealing algorithm, the versatility of simulated annealing with respect to the type and extensiveness of problems and their constraints that can be handled, the ease of formulating a layout problem in terms of simulated annealing ingredients, the reported effectivity of simulated annealing with respect to layout problems in current literature.

1 In connection with simulated evolution one should read tness function, and in connection with simulated annealing one should read cost function. Moreover, maximizing tness is equivalent to minimizing cost.

34

Optimization Approach Based on Simulated Annealing

In the next sections we explain how the layout generation concepts are integrated into the overall simulated annealing optimization framework. We present a ow in which concepts which will be claried in later chapters, are briey touched upon in order to facilitate explanation of the integration of these concepts within the global framework. The main concepts are placement (Chapter 6), routing (Chapter 7), and physical phenomena such as crosstalk, parasitics and process variations and their impact on layout generation (Chapter 8). The aforementioned concepts are formulated in a novel incremental approach which is one of the main contributions of this work.

4.1 Optimization Flow


The overall ow of information within the SA-based optimization framework is shown in Figure 4.1. For the sake of presentation clarity, we start with a global description of the items in the ow diagram. Then, in the succeeding sections the diagram items are explained in more detail, together with a justication of the choices made. The ellipses denote given or produced information, see also Figure 2.2. The rectangles denote actions that are taken. Furthermore, the diamonds indicate when a decision is taken. The shaded ellipses indicate that the information is (dynamically) computed inside the system and need not be supplied by an external source. The information in the shaded ellipses is very useful for monitoring optimization progress. It should be noted that the arrows in the ow diagram express a necessary ow of information. This does not mean that at certain stages in the ow no information from elsewhere is needed. To prevent cluttering the diagram with too much information, some of these arrows have been left out. The rst concept which is clear from the diagram relates to the representation of a placement and how it is modied in order to nd a good one eventually. What we mean by good is determined by our cost function. Obviously, a placement is computed by rst generating a relative placement and then computing the actual absolute positions of all blocks. Given a placement, a global routing is computed during the next step. The global routing determines where wires run over the chip area in a global fashion. The main purpose of global routing is to aid in estimation of necessary routing space at an early stage in the process and to facilitate nding better detailed routing solutions eventually. Due to the fact that a typical placement of blocks contains unoccupied space, there is some margin left to shift blocks around in this so-called slack space. The denition of the slack space on a block-by-block basis, is called module expansion. We could see this as a virtual enlargement of the real block so that the amount of slack space is virtually minimized. It is intuitively clear that a local improvement is always possible depending on the amount of slack space that is available around a block. Module expansion is also necessary to allow for enough routing space around a block. In cases where the available slack space is not sufcient for routing purposes, a module (which contains the block) needs to be expanded in a clever way such that routing requirements will be satised. The next action to take is substrate coupling effect minimization. Depending on specic substrate coupling sensitivities and module noisiness properties, a local improvement is computed so that the local impact of substrate coupling is decreased. After these steps, a detailed placement is obtained. At this point the annealing schedule comes into play. Depending on the current system temperature and a certain stop criterion on the temperature , it is determined to continue into the optimization loop or to proceed to the next

4.1 Optimization Flow

35

initialization

sequence pair

generate relative placement

perturb sequence pair

module info

compute absolute placement

pin info

compute global routing

yes accept placement? no undo former perturbation

technology

module expansion

module info

substrate coupling effect minimization

evaluate cost function

cost function parameters

detailed placement

adjust temperature

no detailed routing

yes

final layout

Figure 4.1: Flow of the simulated annealing approach incorporating placement and global routing. step in the sequence which is the detailed routing step. If the choice is made to continue placement optimization, then subsequently the temperature is adjusted according to a predened temperature cooling schedule. Next, the cost function is evaluated and, depending on the outcome, the current placement is accepted or rejected.2 If it is rejected then the previous placement becomes the current placement and we proceed from this point. If the current placement is accepted then we perturb the placement (by perturbing the sequence pair place ment representation) and compute a new placement. This loop it iterated until evaluates to true. Detailed routing is then performed, which is assumed to be possible by virtue of proper previous optimization steps, and a nal layout is generated.
2 We

assume that the optimization process yields only yes evaluations during the rst iteration loop.

36

Optimization Approach Based on Simulated Annealing

4.2 Problem Representation


We already pointed out the relevance of adopting a computationally efcient means to describe a placement of blocks. Moreover, an efcient (global) routing representation is also important since placement and routing go hand-in-hand. These representations were implicitly used in the previous discussion of the simulated-annealing-based optimization framework. Consequently, lack of efciency in either of the representations will have a severe detrimental impact on overall performance. In other words, the efciency of problem representation has a signicant impact on nding high-quality solutions within a minimal amount of computing time. Hereafter, a global idea is given on how the problem representation is tted and integrated into the optimization framework.

4.2.1 Placement
Based on reasons which are given in Chapter 6 we use the sequence-pair placement representation. Basically, we can state that the sequence pair (SP) representation ts well into an iterative optimization framework where (small) changes are applied to the placement during each iteration. Furthermore, the SP structure has advantageous properties in the context of mixed-signal layouts, such as a general non-slicing structure and low global sensitivity to small local changes. Also, important issues such as matching constraints [45], range constraints [46], boundary constraints [47], and interconnect constraints, can be incorporated into a sequence pair formulation. Formally, an SP consists of a pair of sequences [48]:

and

Every sequence is a permutation of the set of integers , where is the number of modules to be placed. Consequently, the sequence pair solution space contains elements. As a result, changing a placement comes down to changing the associated permutation. In the optimization framework, the placement of blocks is split into three parts: a relative placement part, an absolute placement part, and a detailed placement part. This way, the placement problem can be handled more efciently at an abstract level while at the same time allowing a clear graphical interpretation of what is going on during optimization. The latter property leaves a window open for the designer to obtain insight into the procedure and tune the algorithms and (intermediate) results.

4.2.2 Routing
In order to handle complexity, the routing approach is split into separate steps. We adopt a two-step approach: a global routing step followed by a detailed routing step. The reasons for this choice are twofold. 1. It is too costly to compute the entire detailed routing information during each optimization iteration. Furthermore, it is intuitively clear that computing very detailed routing information is a waste of resources when the placement is not even close to being nal.

4.2 Problem Representation

37

2. It is doubtful whether a single-step approach can yield good solutions in a reasonable amount of time for larger problem instances, as contrasted with a two-step approach. Furthermore, a coarse rst step can yield enough information to guide both the second renement step and a possible local adjustment in placement if necessary. Global Routing From a placement, a set of rectilinear wires connecting all modules in a net can be computed for all nets. The accuracy and associated computational effort can be traded off against each other. Global routing serves two main purposes in layout generation, both of which are especially meaningful in the context of mixed-signal designs: 1. All modules should be connected in such a manner that all constraints on the pins in a net are met; otherwise the placement is inadequate. For the sake of simplicity, we assume that a Steiner minimal tree connecting all pins in a net implies adherancy to the previous condition. 2. Enough routing space should be reserved for detailed routing along the sides of the modules. At least the minimum amount of space can be computed using global wiring information in addition to pin, net, and design rule information. Furthermore, the requirement to minimize performance degradation due to crosstalk between adjacent wires belonging to different nets, increases minimum spacing. In almost all integrated placement and routing approaches, only the rst item is considered. And in virtually all cases a very crude global routing approach is taken. A de-facto standard routing estimation methodology is minimal bounding box (MBB), or half-perimeter, routing. The main reason for doing so is ease of implementation. However, apart from the fact that a coarse routing yields, by denition, routing estimations with large deviations from an optimal routing solution, we also observe the following fundamental disadvantages:

No (performance-driven) wire spacing and routing space estimation can be employed due to lack of spatial information. The coarse routing values might actually conict with (near) optimal routing values in the sense that the former might indicate that a certain placement induces a better routing while it is actually worse. An important consequence is dramatical deterioration of optimization results and, likely, optimization convergence.

As a result, an accurate global routing methodology is proposed here, based on sparse routing graphs and fast and efcient Steiner minimal tree approximation heuristics. Chapter 7 elaborates extensively on global routing. Detailed Routing When problem instances become large, it is infeasible for a single-step routing approach such as classical area routing to determine exactly the spatial properties of each wire for all the nets in one sweep. The problem needs to be made manageable somehow. A hierarchical or multistep approach is a common way to manage complexity. Within an iterative framework, a multi-step approach is particularly advantageous because of the fact that computation time

38

Optimization Approach Based on Simulated Annealing

can be reduced by early detection of low-quality placements for which no adequate routing can be found. Another advantage of a multi-step approach, which in our case resolves to a two-step approach consisting of global routing followed by detailed routing, is that detailed routing in itself is a very hard problem which can be mitigated by a priori obtained global routing information. Although a solution for detailed routing is not proposed in this thesis, its relevance to high-quality mixed-signal layout generation should be clear.

4.2.3 Substrate Coupling


Parasitic coupling is a well-known phenomenon in circuit layouts, which is especially detrimental in mixed-signal designs. By virtue of accurate placement information inherent to the adopted representation, it is possible to reduce the effect of substrate coupling within a given placement. For this, we exploit the slack space around a module to shift the module in such a way that the negative coupling effect is reduced. This step can be performed without overhead in terms of computational complexity. Chapter 8 discusses this phenomenon and our approach in depth.

4.3 Perturbation Operators


In order to search the solution space for high-quality members, so-called perturbation operators must be dened to change a given solution into another solution. The set of perturbation operators, which can in turn be built from primitive perturbation operators, together with a generation function determine the neighborhood of each solution. In mathematical terminology one can speak of a state to denote a solution. The neighborhood of each state is determined by the set of perturbation operators and the generation function. Herewith the state space is constructed. Furthermore, the cost function assigns a cost value to each state in the state space. Consequently, we can say that non-global local minima in the cost landscape are determined by the set of perturbation operators and the generation function. The latter decides on the magnitude and/or type of a perturbation [35, Chapter 10]. The global minima are, of course, solely set by the cost function. In order to allow stochastic search algorithms to sample the solution space efciently and effectively, we should take care of appropriately chosen perturbations. As Otten and Van Ginneken [35] already noted, the diameter of the search space should be made sufciently small in order to reach every solution quickly at high temperatures. In practice this means that (initial) perturbation amplitudes should be sufciently large. However, large perturbations imply large deviations in the cost function value and thus a very irregular cost landscape. In order to decrease the cost deviations, at appropriate times the solution space should be sampled smoothly, too. As a consequence, at lower temperatures the perturbation amplitudes and possibly the type of perturbations should be adapted to comply to this requirement. We dene the following primitive perturbation operators. P1. P2.

-swap( ): interchange elements and in sequence . -swap( ): interchange elements and in sequence .

P3. swap( ): interchange elements and in both sequences and .

4.3 Perturbation Operators P4. rotate(


): rotate element over angle

39

, in clockwise direction.

P5. mirror(): mirror element with respect to the -axis (

) or -axis (

).

Perturbation operators P1, P2, and P3, form a complete set in the sense that within a nite number of steps, any sequence-pair conguration can be obtained from an arbitrary starting solution. We state this more precisely. From permutation theory we know that every sequence, which is a permutation of elements, can be written as a product of disjunct 2cycles which we call a swap. Furthermore, this product is unique except for the order of the swaps. Lemma 1 Given two arbitrary permutations and of all elements in . Exactly swaps are needed to go from conguration to (and vice versa), in the worst case. Proof The sufciency condition follows from the fact that there is a swap that will put at least one element in the right place. That element is left untouched afterwards. Furthermore, the last swap will necessarily put the last two elements in place. Thus, we never need more than swaps. The necessity condition follows from inductive reasoning. It is easy to see that

holds and is minimal, i.e. the left-hand side cannot be represented with less than two swaps in the worst case, and these swaps are unique. Here represents concatenation, and is a swap of elements and . Let

(4.1)

be a minimal swaps representation of a permutation with elements. For ously holds. For a permutation with elements we can write

this obvi(4.2)

Substituting (4.1) in (4.2) we get

Thus, for a (worst-case) permutation of elements we need at least swaps. Therefore, for a permutation of elements we need swaps, in the worst case. Theorem 1 From a given arbitrary sequence pair , we can create any other sequence pair using at most perturbations from the perturbation set .

Proof Applying perturbation P1 (P2) on sequence ( ) guarantees nding ( ) within swaps with the aid of Lemma 1. In principle, perturbation operator P3 is redundant since it is a concatenation of P1 and P2. However, P3 is an intuitively attractive perturbation operator which is fully symmetrical.

40

Optimization Approach Based on Simulated Annealing

Moreover, it helps reducing the diameter of the search space because typically the number of required swaps is lessened with the addition of P3. Perturbations P4 and P5 do not change the sequence pair. Perturbation P4, however, inuences the absolute location of modules, while P5s only purpose is for minimizing wire length.

4.4 Acceptance and Generation Functions


The acceptance function that is used within the proposed simulated annealing framework is the standard Metropolis criterion given by (3.3), where the probability of accepting a new solution is based on the difference in cost with the previous solution and the annealing temperature. More exactly, when the new cost is lower than the previous cost, the new solution is unconditionally accepted. In the case where the new cost is higher than the previous cost, the probability of acceptance becomes increasingly smaller with higher cost difference and lower temperature. A typical realization of this would look like

accept new solution

random

where random is a random number generator which generates a real value between 0 and 1 with a uniform distribution. As a consequence, at high temperatures almost all cost increases are accepted, effectively turning the algorithm into a random walk. At low temperatures mostly only cost decreases are accepted. The generation function (also called the selection function) is taken to be the identity function in the proposed framework, adopting the standard approach that is suggested in [35]. Although no attention is given to ne-tuning this parameter, it should be noted that its impact may be signicant as it embodies the effective solution space sampling behavior. In other words, the generation of moves which are going to be rejected with high probability is inefcient, thus avoiding such generation is efcient. Of course, this is only practically effective when such moves can be identied relatively quickly, for instance by means of a distance association.

4.5 Temperature Schedule


An adaptive temperature schedule is used which is controlled in such a way that the annealing process stays in quasi-equilibrium and yet converges as quickly as possible to a global optimum. This approach is entirely adopted from Otten and Van Ginneken [35]. The decrements of the temperature are chosen in such a way that the steps do not disturb the equilibrium density too much. Although the approach taken is justiable and gives good results in practice, it is, however, still an active eld of research in statistics to estimate the equilibrium density of an inhomogenous Markov chain in order to determine how close we have come to an equilibrium distribution [49]. The interested reader is referred to [50] for an excellent recent overview, providing more insight into this eld. The essential points in a temperature schedule are:

4.6 Stop Criterion

41

initial temperature The initial temperature should be high enough in order to guarantee independence of the nal solution with respect to an initial solution. temperature decrement The temperature decrement can be deterministic or stochastic. As yet it is unknown what type of temperature decrement yields best results. nal temperature The nal temperature can be xed or dynamically computed as a function of several optimization parameters such as the estimated standard deviation of the cost.

In our optimization framework we adopted the strategy of Otten and Van Ginneken [35, Chapters 8 and 11].

4.6 Stop Criterion


The stop criterion which is adopted in our framework is tacitly taken from [35]. Although it is noted in [35] that their observations should be applied with care in a general fashion, i.e. to arbitrary problem instances, we did not spend any effort on verication. In order to expect good and robust performance we should assert all assumptions and observations in our problem setting. On the other hand, it cannot be denied that building a robust simulated annealing implementation appears to be an art [29]. Therefore, we accept some degradation in performance by not complying to all requirements. This is justied in two respects:

in the light of the knowledge that it is always possible to improve performance by tuning, and by virtue of the fact that our goal is to demonstrate feasibility of concepts.

4.7 Cost Function


The cost function that is used in our simulated annealing optimization framework is

(4.3)

where , , and are user-specied weight factors between 0 and 1 that determine the relative importance of each term in the cost function. The normalization constants , , and are determined in such a way that the weight factors have equal importance. Furthermore, stands for chip area, stands for wire length, and stands for coupling impact. The cost function given by (4.3) is a generalization of the de-facto standard cost function used in literature, where all normalization constants are typically set to unity.

4.7.1 Implicit Cost Evaluation


It is well known that the terms is the aforementioned cost function typically conict. Consequently, the optimization task is hampered by repelling forces, eventually leading to longer

42

Optimization Approach Based on Simulated Annealing

computation times and worse solutions. For this reason, it would be intuitively better to use a single cost term

in which no inherent conict is apparent, and which captures the essence of the designers specications.

For instance, this could be accomplished by taking only the total wire length into account, since short wire length normally means that blocks are placed close together. In cases where various placements of blocks exist with the same total wire length, the placement with smallterm est chip area should be taken. This could be accomplished by using an additional with a small weight value . Another way is to translate wire length into wire area and implicitly incorporate this into the total chip area by expanding the modules in the placement. This approach is already quite sophisticated. For the sake of comparability with other published results, we will adhere to the general approach in which essentially both chip area and wire length are taken into account. However, experimental results with a single cost term are also given in Chapter 7.

4.8 Concluding Remarks


We gave an overview of the components which are used in our optimization framework for generating a mixed-signal layout. The integration of three main, strongly coupled, aspects was discussed: efcient placement representation, efcient global routing computation, and efcient substrate coupling impact minimization. However, we note that these ingredients are not sufcient to generate a complete mixed-signal layout. One of the missing components is detailed routing. Nevertheless, it should be clear from our setup that all requirements for proper mixed-signal layout generation can be complied with in our approach without encountering fundamental obstacles.

Chapter 5

Efcient Algorithms and Data Structures


Automation of processes can not be kept separately from computer algorithms. Even stronger, algorithms form an essential key element of any CAD tool. The efciency of an algorithm has a direct impact on the performance of a CAD tool. Although it is not always clear from higher-level algorithmic descriptions, data structures are essential for manipulating data in an algorithmic environment. Data structures become especially important during the implementation phase of an algorithm; they can make or break the practical usefulness of an algorithm. In this chapter we discuss data structures which are relevant in the context of mixed-signal layout generation. The intention is to present a non-exhaustive but representative set of tools which can be used to design efcient algorithms, and consequently an efcient overall system. The emphasis will be put on dynamic algorithms, that is, algorithms that can efciently deal with continuously changing information over time, both in terms of required memory space and required computation time. It is clear that dynamic algorithms play a central role in CAD tools in general, with mixed-signal layout generation as a special case. Before going into detail on efcient algorithms and data structures, a few denitions are in place. Denition 2 (Algorithm) An algorithm is any well-dened computational procedure that takes some input, or a set of inputs, and produces some output, or a set of outputs. The mapping task of an algorithm is performed using sets of elements in which the data is represented and operations that are dened on these sets. Unlike static sets which are used in mathematics, the sets which are used in and are fundamental to computer science are highly dynamic. That is, the sets can grow, shrink or otherwise change over time. Denition 3 (Data Structure) The representation of a nite dynamic set of elements in combination with operations dened on this set, is called a data structure. An implementation of an algorithm with a certain data structure is called a program. Important features of an efcient algorithm are:

niteness: the algorithm stops after a nite amount of steps; correctness: the output of the algorithm complies with the pre-specied post-condition; efciency: the number of primitive computer operations used to accomplish the desired mapping is as small as possible (within the limitations of the employed data structures).

44

Efcient Algorithms and Data Structures

5.1 Computational Model


We shall assume a generic one-processor random-access machine (RAM) model of computation as our implementation technology and understand that our algorithms will be implemented as computer programs. In the RAM model, instructions are executed one after another, with no concurrent operations. The main performance measures we will be concerned with are run time and storage space, with unit-cost measure. That is, every operation and every storage element has unit cost. Time is measured by counting the number of executed instructions, and space is measured by counting the number of used memory cells. Each memory cell can hold an arbitrarily large number. Usually, it is very hard or impossible to compute the exact amount of run time needed for a given input for an algorithm. This also applies to computing the exact storage space. Therefore, the performance measures are blurred to make them easier to estimate. Instead of determining the exact performance of an algorithm, the worst-case performance of an algorithm is determined. Denition 4 (Worst-case Performance) The worst-case run time (space) performance of an algorithm is the maximum number of time (memory) units the algorithm needs to process an input , relative to the input size , for all possible inputs of size . The input represents the problem instance. Hence, the input size is synonymous to the problem instance size. Denition 5 (Problem Instance Size) The size of a problem instance is equal to the minimum number of information elements needed to describe that problem in a specic representation.

5.2 Asymptotic Analysis


Although worst-case performance evaluation might be easier than determining the exact performance of an algorithm, in most cases the former is still non-trivial. The introduction of a so-called Big Oh () operator is very convenient here. It was originally introduced by Paul Bachmann in 1894 for asymptotic analysis, but it is a de-facto standard in computer science nowadays [51]. Dealing with the operator is especially interesting when the problem instance size becomes large. In those cases the approximation error is negligible. On the other hand, when the size of the problem instance at hand is relatively small, then the error may become unacceptably large. In cases where we may apply -notation, its beauty becomes apparent in the fact that it suppresses unimportant detail and emphasizes salient features. Essentially, the -operator denotes a set of functions. Formally, for a given function , we denote by the set of functions

(5.1)

Similarly, the -operator (Big Omega) and -operator (Big Theta) are dened:

(5.2)

5.3 Computational Complexity

45

(5.3) In words, the above relations indicate that if , is bounded from above by multiplied by a suitable constant, when is sufciently large. Furthermore, if , is bounded from below by multiplied by another suitably chosen constant, for sufciently large . It is not difcult to see that for any two functions and , if and only if and .

Graphic examples are shown in Figure 5.1. Note the abuse of the equality sign to denote member of a set. It is a standard convention in asymptotic analysis.

Figure 5.1: Graphic examples of the , , and notations.

5.3 Computational Complexity


For the sake of comparison of algorithmic performance we want to know how good (or bad) a certain type of algorithm using specic data structures will perform. Asymptotic analysis is an excellent mathematical tool to measure algorithmic performance. This type of analysis in an algorithmic context is also called computational complexity analysis. We already mentioned worst-case analysis. In practice we can distinguish three types of analyses:

worst-case analysis average-case analysis amortized analysis

Worst-case analysis is by far the most used analysis approach. An important reason for using worst-case analysis is that the occurrence of a problem instance that induces worst-case behavior, might be disastrous. Maybe an even more important reason is the fact that worstcase analysis is typically far more easier to perform than other types of analyses. However, if the worst-case situation does not occur often, the analysis results might be deviating severely from a more elaborate type of analysis. Average-case analysis is concerned with the average computational complexity of an algorithm for a specic set of inputs. This type of analysis is most accurate, but also very difcult to perform in practice. Moreover, the analysis results depend on the assumed distribution of input problem instances. To simplify the analysis, often a uniform input distribution is chosen. Unfortunately, this may not always be a good assumption. In amortized analysis, the time required to perform a sequence of data structure operations is averaged over all the operations performed. This type of analysis can be used to show that

46

Efcient Algorithms and Data Structures

the average cost of an operation is small if one averages over a sequence of operations, even though a single operation might be expensive. Amortized analysis differs from average-case analysis in that probability is not involved in the sense that no assumptions about the input distribution are made; there is only averaging over time. The averaging occurs over a worstcase sequence of operations. For more information, we refer the reader to [34].

5.4 Data Structures for CAD


The performance of an algorithm strongly correlates with the performance of the data structures that it uses to perform its task. It is therefore necessary to choose the proper data structures for a given algorithm with a specic functionality. Requirements for advanced data structures in the context of complex algorithms, which in turn consist of smaller algorithms, are exibility and efciency. Flexibility is needed to enable the usability of the data structure for a variety of operations. Efciency is needed to guarantee low computational complexity implying that the algorithm allows for substantial scaling. Moreover, high-performance data structures should also feature small constant factors which are hidden in their complexity measure. For example, in terms of computational complexity an algorithm requiring time units and an algorithm requiring time units to perform the same task, would look identical in terms of the operator: . However, in practice the latter algorithm will be substantially faster than the former. On a more practical level, a data structure should also be not too complex so that it becomes overly difcult to implement it correctly. Of course, this depends on the gain in computational complexity and the range of problem instance sizes that will be used. In this section we will present a few interesting data structures which are especially interesting for CAD applications because of their inherent manner of representing data.

5.4.1 Corner Stitching


Corner stitching is a data structure for representing rectangles, or objects that can be segmented into rectangles, in a two-dimensional plane. It was introduced by Ousterhout [52] for the purpose of visually representing layouts that can be modied interactively. The strength of the corner stitching data structure lies in the facts that it is conceptually simple, and that there is large number of operations that can be performed efciently on it. Each rectangle is represented by its lower left corner coordinate and its width and height . Additionally, there are four so-called stitches to neighboring rectangles; one for each direction (up, right, left, and down). Figure 5.2 shows a basic corner stitching rectangle including all four corner stitches. The name is due to the resemblance with a patched cloth. The following operations are dened on the corner stitching data structure [52]:

point nding: return the rectangle in which a given

is located; location;

neighbor nding: return all rectangles that touch a given side of a given rectangle; insert rectangle: insert a rectangle of given width and height at a given delete rectangle: delete a rectangle from a given

position;

5.4 Data Structures for CAD up right height left

47

down

width Figure 5.2: A basic corner stitching rectangle.

area search: check if there are any rectangles of a certain type in a given area; area enumerate: enumerate all rectangles of a certain certain type in a given area.

A striking feature of the corner stitching data structure is its ability to represent both empty and non-empty regions in the plane. This notion can be generalized to more than two types of rectangles if necessary, without incurring any performance loss in terms of computational complexity. In fact, the corner stitching data structure is a generalization of the doublylinked list data structure to two dimensions, where each list item covers a part of the plane. Figure 5.3 shows an example of a set of rectangles in the plane represented by the corner stitching data structure. Notice the white area which represents unoccupied space, whereas the shaded area represents occupied area. The corner stitches are shown explicitly in the rectangular dashed-outline region. When browsing through the data structure, these corner stitches are used to go to a neighboring rectangle. From this gure it is also clear that examining physically close rectangles is a local operation which can be performed very fast. Another feature of the corner stitching data structure is a property called maximally hori-

Figure 5.3: A placement of rectangles in the plane explicitly represented using the corner stitching data structure. zontal empty tile. This means that an empty tile is always maximally extended in horizontal direction, which is also shown in Figure 5.3 where the white unoccupied area is split into maximally horizontal (empty) rectangles. The corner stitching data structure performs well in practice due to its relatively simple structure. However, its implementation requires a lot of care to avoid some tricky pitfalls. Its

48

Efcient Algorithms and Data Structures

actual performance can vary quite a lot. For example, inserting a set of rectangles in the plane will be performed faster if less segmentation of the plane is induced around a rectangle during insertion. Thus, the order of insertion plays a role. Typically it is more advantageous to place the larger rectangles rst and then the smaller ones. The rationale behind this is that large rectangles can shield a larger portion of the plane from other parts so that less interaction is required. The reader is referred to [52, 15] for more information. We conclude by giving Table 5.1 of relevant corner stitching operations. From this table we can see that a few operations Table 5.1: Corner stitching operations and computational complexities. A hint is an auxiliary pointer to a proper object in the data structure. operation av. comp. av. comp. complexity compl. with hint insert () delete () neigbor enumeration () area enumeration point nding can typically be performed in constant time, independent of the number of items already inserted into the data structure. Although most operations have a worst-case complexity of , where is the total momentaneous number of rectangles in the plane, this occurs seldom in practice. Normally, searching for a certain module in the data structure requires a considerable amount of effort; on average. Typically, the actual number of relevant rectangles is equal to (selected area) (5.4) (total rectangle area) It is also shown in the table that a hint, which is an auxiliary pointer to some object (empty or non-empty) in the data structure, can signicantly improve average computational complexity. Of course, the strength of a hint is actually unleashed when it is chosen in such a way that it provides maximum gain. In practice this means, that computing a hint should be performed much more efciently than the average complexity of the operation without a hint. A worst-case sequence of operations will provide more insight on the issue where the break-even point lies. If we know in advance the (approximate) maximum number of objects which are going to be stored within the corner stitching data structure, a hash table can be used to improve some of the average complexities without a hint. The approach is as follows. Create a hash table which can store the objects indexed by their, say, bottom-left coordinates. As this implies each existing object can be found in constant time, the operations that involve a specic object to be found prior to performing the actual operation, can be decreased in complexity.

5.4.2 Linked List


The list-based data structure is well-known and covered in every textbook on data structures. However, for completeness we will discuss this common type of data structure briey. The simplest list-based data structures is the singly-linked list where data items are linked to

5.4 Data Structures for CAD

49

each other in a sequential one-directional fashion. A somewhat more sophisticated list is the doubly-linked list where the data items are connected to each other in a bi-directional way. Conventional lists are useful when dynamic sets need to be maintained, and the primary operations are insertion of an element and deletion of an element. Also, enumeration of all elements in the set can be performed efciently. Lists are not efcient when a specic element needs to be looked up in the set because every element before that element in the list has to be looked at. For the lookup operation we need time in the worst case. Unfortunately, this is also the average-case computational complexity. Table 5.2 contains computational complexities of list operations. Note Table 5.2: List operations and computational complexities. operation computational complexity insert () delete () nd () enumerate that deleting an arbitrary item requires a nd () operation before the actual deletion. Only deleting an item with a known location, for instance at the head or tail of a list, can be done in constant time. The same holds for inserting an arbitrary item at the head or tail of the list. Note that operations on items with a known location in the list, can be performed in constant time with the aid of a hash table. Of course, this approach is only useful when the maximum number of items in the list can be estimated beforehand and this number is much smaller than the universe of storable items. Recently, a more powerful variant of the list-based data structure has been introduced by Pugh [53]. It is called skip list and it was proposed as an efcient alternative to balanced trees. The key ingredients in a skip list are: a logarithmic number of levels containing data items and a probabilistic approach to skip pointers. Skip lists appear to have very good performance in practice and can do whatever a balanced tree can do, and that at least as fast. Where balanced trees become inefcient when objects are frequently inserted and deleted from the set, skip lists take over by avoiding expensive re-balancing operations after each modication of the set. Last but not least, skip lists are easy to implement. Figure 5.4 shows the basic notion behind the skip list data structure. In Figure 5.4(a) a conventional linked list is shown. In order to reduce searching time, an additional pointer is introduced with every other object. Each such pointer skips one object. The result is that searching time is reduced by half. This idea can be applied to every fourth ( ), every eighth ( ), every sixteenth ( ) pointer, and so on. Generally the maximum number of pointer levels is chosen . It is now clear that each element can be found in time using classical binary search principles. However, inserting or deleting an item, while maintaining the skip list properties, can be very awkward. This problem is solved by Pugh using a probabilistic approach. The skip list data structure can still degenerate into a linked list, but that probability is utterly small for any reasonable size of . Table 5.3 shows the computational complexities of skip list operations. In a development environment, however, it may be desirable to exactly reproduce results or to compare results after only one specic setting has been changed. With a probabilistic data structure it might

50
3 5 7 13 21 34

Efcient Algorithms and Data Structures


55 89 144 233 NULL

(a)

NULL 3 5 7 13 21 34 55 89 144 233

(b)

NULL

13

21

34

55

89

144

233

(c)

Figure 5.4: The structure of the skip list is essentially a generalization (b) of the linked list structure (a). If probability is added, a more irregular structure is obtained (c), but typically it is very efcient for insertion, deletion and searching. Table 5.3: Skip-list operations and computational complexities. operation av. computational complexity insert () delete () nd () successor () predecessor () enumerate be troublesome to judge the impact of a change when also the data structure performance changes. Therefore, a deterministic algorithm might be preferable under these circumstances.

5.4.3 Splay Tree


Binary search trees are well-known representations for information. An important feature of a binary search tree is the fact that an order relationship holds true for all nodes in each subtree of the tree. More specically, all nodes left of a node have a value smaller than the value associated with node , and all nodes right of node have a value larger than the value of . An example is shown in Figure 5.5(a). Note that . In order to facilitate presentation, we will adopt the examples and tree structure used by the inventors Sleator and Tarjan of the splay tree data structure [54], which is a special type of binary tree data structure. An equivalent full tree representation of the binary search tree in Figure 5.5(a) is shown in Figure 5.5(b) in which each internal node () has exactly two child nodes. The leaf nodes are drawn as triangles in this gure. The tree at the left side is obtained directly by contracting the leaf nodes and internal nodes as shown at the right side of the gure. When objects are inserted and deleted randomly, binary search tree performance is unmatched. However, if objects are inserted in order, the binary-tree structure degenerates into

5.4 Data Structures for CAD

51

(a)

(b)

Figure 5.5: An example of a binary search tree in (a) a classical tree representation, and (b) an equivalent full binary tree representation used in [54]. a linked list, and performance plummets. A great amount of work has been spent on nding tree-balancing algorithms and techniques to overcome the effect of degeneration. The result is a colorful set of balanced binary tree algorithms: B-tree, AVL trees, red-black trees, randomized binary trees, splay trees, and many more. The splay tree data structure is a very efcient data structure in that it has amortized computational complexity per operation, where the time per operation is averaged over a worst-case sequence of operations. Essentially, each splaying operation, which is a simple restructuring heuristic, resembles a move-to-front technique of the splayed item plus a shortening of the height of the current tree. Exactly three different splaying cases can occur. These cases are shown in Figure 5.6. To splay a tree at a node , we repeat the aforementioned primitive splaying operations until is the root of the tree. Splaying a node at depth takes time [54], that is, time proportional to the time to access node . Splaying not only moves to the root, but roughly halves the depth of every node along the access path. This halving effect makes splaying efcient. Note that splaying, and consequently a splay tree, is fully deterministic. It is clear that under some conditions of access, insertion, deletion probabilities over the universe of elements, the splay tree data structure can perform substantially better than in the worst case. From experiments [55] and experience in the eld, especially with respect to randomized binary search trees [56], splay trees typically outperform other balanced tree implementations. Therefore, we have chosen splay trees as our primary balanced tree data structure for implementation. Table 5.4 shows the computational complexities of splay tree operations.

5.4.4 Hash Table


Applications that require a dynamic set that supports only the dictionary operations insert, delete, nd, and enumerate, could employ a data structure such as the hash table. A hash table is an effective data structure for implementing dictionaries. Although searching for an element in a hash table can take as long as searching for an element in a linked list, i.e. time in the worst case with the size of the table, in practice, hashing performs extremely well. Under reasonable assumptions, the expected time to search for an element in a hash

52

Efcient Algorithms and Data Structures

(a)

(b)

(c)

Figure 5.6: All three splaying cases. Each case has a symmetric variant which is not shown. The accessed node is . (a) Zig: terminating single rotation. (b) Zig-zig: two single rotations. (c) Zig-zag: double rotation. Table 5.4: Splay-tree operations and computational complexities. The worst-case and average-case complexities are equal. operation computational complexity insert () delete () nd () successor () predecessor () enumerate table is . In fact, a hash table is a generalization of an ordinary array in which direct addressing is

5.4 Data Structures for CAD

53

performed in a clever way. A hash table becomes especially interesting if the number of keys to be stored at any time moment is small compared to the size of the key space. Instead of using the key directly to access a position in the array, the array index is computed from the key, which is called hashing. This way, the size of the array can be kept proportional to the number of keys instead of the size of the key space, as is the case for ordinary array storage. Figure 5.7 graphically shows the principle of hashing. Keys from the universe of keys are mapped to the arrary using a hash function , with . Due to the fact that the size of is much smaller than the size of , and the hash function is not perfect in the sense that it does not know in advance which keys from are going to be stored in , collisions can occur. That is, some keys will be mapped to the same slot position in . An efcient way to resolve collisions is by means of chaining, i.e. keeping colliding keys in a list. For the shown example, keys and collide and are chained.

(universe of keys)

(used keys)

NULL Figure 5.7: The principle of hashing where collisions are resolved by chaining. The essential elements for an efcient hash table implementation are: the hash function, and the capacity handling of the hash table. A hash function

(5.5)

is said to hash an element , where denotes the key space, to slot in the hash table. Since hashing is performed during every hash table operation, the hash function needs to evaluate quickly and have good distributing properties. Knowledge on the probability distribution of the input elements will facilitate the construction of a good hash function. Typically, heuristic techniques are employed for this purpose. In the case where not much is known about the input distribution, except for the fact that it is quite unpredictable, general approaches can be taken. A common technique is the division method, yielding for instance (5.6) where should preferably be a prime number at least as large as the number of slots in the table, and not too close to exact powers of 2. Instead of mapping a single key, it is also easy to map a pair of keys , which can also be interpreted as a point in the plane, into a hash table. The notion of double hashing ts

54

Efcient Algorithms and Data Structures

as a glove in this respect. In double hashing the hash function is

and

(5.7)

where is prime, is a positive integer smaller than , for instance . Because generally we do not know which elements of are going to be stored in the hash table, by denition so-called collisions will occur. An effective way to handle collisions is by means of chaining. The chaining principle essentially turns each slot in the hash table into a linked list. If the size of the hash table is well-chosen, the expected length of a chain is very small and does not depend on . Regardless of the fact whether or not collision resolution is employed, the number of slots in a hash table needs to be large enough to avoid deterioration of performance. If the set of elements that is going to be stored, is known in advance, a so-called perfect hash function can be computed which guarantees a one-to-one mapping in the hash table. Of course this is a trade-off between performance gain by avoiding collisions and effort needed to compute a perfect hash function. Table 5.5 shows the computational complexities of hashing operations. Collision resolution by chaining is especially attractive when the number of keys in the hash table approaches the number of slots in the hash table; inserting a new key always takes . However, when the number of keys in the hash table grows signicantly larger than the hash table size, then nding and deleting a key is performed proportionally slower. If a truly dynamic set has to Table 5.5: Hash table operations and computational complexities. Note that is the number of slots in the hash table and not the number of elements currently inserted in the table. operation av. computational complexity insert () delete () nd () enumerate be maintained and the number of items in the set is unknown in advance, then a hash table is likely not the best choice.

5.4.5 Priority Queue


In many applications we need to maintain a set of elements that changes dynamically over time. Each element has an associated value called a key. Furthermore, typically the following operations are required:

insert

which inserts element into the set ,

5.5 Concluding Remarks

55

minimum which returns the element of with smallest key, extract min which removes and returns the element of with the smallest key.

A data structure with the aforementioned properties is called a priority queue. One application of priority queues is to schedule jobs on a shared computer. It also has great utility in VLSI design problems. Most (practical) implementations of efcient priority queues have amortized computational complexity bounded by , where is the momentaneous number of elements in the queue.

5.4.6 Other Advanced Data Structures


Research on data structures is a very active eld. Therefore, it is sheer impossible to present all state-of-the-art works in a dissertation that focuses on the combination of high-performance data structures and electronic design automation. One reason is the fact that a good data structure in theory does not need to imply a good data structure in practice, and vice versa. Since we want to have the best of both, we have to settle with data structures that have been thoroughly investigated both in theory and practice, so we can rely on them and use them as building blocks. Another reason is the fact that research time is limited. As a consequence, a trade-off must be made between application-specic data structures (optimal for a smaller range of applications, but with higher performance) and more generic data structures (usable for a broader range of applications, but with lesser performance). The interested reader might consult [34] for a good overview of other advanced data structures such as Fibonacci heaps and red-black trees. Also, AVL trees [57] are worth mentioning. Last but not least, the Van Emde Boas data structure, also known as a stratied tree, is an extended priority queue that has unmatched performance. A fundamental limitation is the restriction that the universe of keys is the set [58, 59]. A signicant improvement with respect to storage requirements was proposed by Mehlhorn and N her [59], improving the previous space bound to , with the momentaneous a number of elements in the tree.

5.5 Concluding Remarks


An overview is given of advanced data structures which can be successfully used in CAD tools; they are especially interesting, as will be clear from later chapters, in connection with mixed-signal layout generation. Based on specic properties in terms of computational complexity, a data structure (or a combination of data structures) can be selected to perform a specic task with as low computational complexity as possible. Practical considerations such as performance for typical instances and implementation complexity are also important points to consider.

56

Efcient Algorithms and Data Structures

Chapter 6

Placement
When a circuit has been designed in terms of a netlist connecting (properly sized) building blocks, the layout phase is next to follow. This part of the design cycle is called physical design and for contemporary mixed-signal designs this phase is becoming increasingly more important. In fact, it is a dominant limiting performance factor of any state-of-the-art integrated circuit. Two important issues in physical design are placement and routing. This chapter focuses on the placement problem. First we dene the placement problem. Then we give an overview of several approaches to solve the placement problem. Based on our requirements on placement quality and on placement-related issues such as substrate coupling and matching, a choice is made regarding the approach for tackling the placement problem. We will elaborate on an efcient placement representation, which is known as the sequence-pair structure. Its theoretical properties are discussed in detail. Moreover, we unify new ndings with known theories and algorithms. Theoretical fundamental lower limits on computational complexity are given with respect to state-of-the-art approaches to placement computation using the sequence pair representation. Motivated by promising theoretical results, an incremental placement computation approach is devised which has very attractive features in a simulated annealing optimization environment. Experimental results are shown to demonstrate the effectiveness and efciency of the incremental approach. We proceed by discussing an important extension to standard placement, which is constrained module placement in which modules can be constrained to a prescribed location in the plane or forced to be placed at one of the chip boundaries. An improved robust approach is proposed and its effectiveness and superiority over latest published works is demonstrated by experiments. Let us rst specify more exactly what is meant by a placement. Denition 6 (Placement) A set of given rectangular blocks which are placed in a two-dimensional plane, is called a placement. Since no restrictions are put on possible overlap of blocks, clearly not every placement is practical. Therefore, a feasible placement is dened here as follows. Denition 7 (Feasible Placement) A placement in which no overlap of blocks occurs, is called a feasible placement, otherwise it is called infeasible.

58

Placement

The blocks that are used in a placement are normally of xed size, but it is also possible to take blocks with exible sizes. Those exible blocks, also called soft blocks, can be taken from a given set of candidate blocks, under an aspect ratio constraint, or some other mathematical function constraint. Here, the placement problem is dened as follows.1 Problem: The placement problem Instance: A set of blocks of given sizes. A set of pins of which a subset is at the circumference of each of the blocks, representing the connectivity information between the blocks. An objective function , which, for instance, captures the total length of interconnecting wires and/or the area of the smallest enclosing rectangle around all blocks. All feasible placements, with all possible orientations of the blocks. .

Solutions: Minimize:

The classical term oorplanning is strongly related to placement in that it also deals with placement of objects. Only, the approach of oorplanning is different, because it divides the two-dimensional plane into rooms which are big enough to hold all (exible) objects. This way, overlap is avoided by construction. Moreover, empty area is not explicitly represented by a oorplan. Before proceeding, let us dene a oorplan. Denition 8 (Floorplan) A oorplan is a data structure that captures the relative positions of non-overlapping objects that fully cover a certain rectangle in the 2-dimensional plane [60]. The above denition is a sensible special case, in the current context, of the general denition which was re-coined by Otten recently [60]. Consequently, This notion of a oorplan is similar to relative topological placement representations which can be found in many recent works, e.g. [61, 62, 63, 64]. In this respect, oorplanning can be compared with feasible placement computation using a topological placement representation. The main difference is that typical (feasible) placement computation deals with xed-size blocks. When instead of xed-size blocks, variable-size blocks (also called soft blocks) are used, the placement problem is generalized into a oorplanning problem. The result of a oorplanning phase is a sized oorplan. The latter is dened as follows. Denition 9 (Sized Floorplan) A sized oorplan is a oorplan in which each room contains exactly one block, and the block is not larger than the room. Note that the word oorplan, instead of sized oorplan, is also used in literature to denote the result of a placement phase which contains absolute position and size information. Hereafter, the term placement is used to denote a sized oorplan, even in conjunction with soft blocks. Formally, the oorplanning problem can be dened as follows.
1 We

use to denote the power set of

6.1 Previous Work

59

Problem: The oorplanning problem Instance: A set of exible blocks, and a sizing (or shape) function that selects a shape alternative for each block. An object function , which, for instance, captures the total length of interconnecting wires. All sized oorplans, with all possible combinations of shape alternatives, and all possible relative topologies. .

Solutions: Minimize:

We will classify placement representations in slicing and non-slicing. The reason for this classication is the obvious difference in generality. Figure 6.1 shows an illustrative example of a non-slicing placement, which is dened as follows. Denition 10 (Slicing) A placement is slicing if and only if it can be obtained by complete recursive bisection of the placement area. If slicing cannot be recursively continued up to the lowest level, a placement is called non-slicing .

Figure 6.1: An example of a non-slicing placement. The main incentives for using slicing representations over non-slicing representations are the following.

Some placement-related problems which are NP-hard for non-slicing placements, can be reduced to polynomial-time problems for slicing placements. Several useful properties can be attributed to slicing placements, of which conict-free channel routing sequence application is most prominent. A hierarchical design methodology matches well with the slicing oorplan methodology.

Hence, it is clear that both slicing and non-slicing representations have advantages and disadvantages. In the following sections we argue that the so-called sequence pair representation is most suitable for use in a mixed-signal layout generation framework.

6.1 Previous Work


Numerous people have contributed to approaches to solve the VLSI placement problem. Algorithms based on principles from various elds have been introduced in order to nd better solutions for this intrinsically difcult problem which is known to be NP-hard [65]. Due

60

Placement

to this complexity it is impractical trying to nd solutions of any but the smallest problem instances. It is not our intention to give an exhaustive overview. Firstly, the amount of published literature is too large to describe extensively in this thesis. Secondly, it would lead us too far off the purpose of this section, which is to discuss candidate placement approaches. We refer the reader to good overviews in [14, 15] and the references therein. Our purpose of using the phrase placement approach instead of placement algorithm is that the former is more generic. For instance, a placement could be obtained using a forcedirected method with a general (non-slicing, overlap-allowed) representation of blocks. The placement algorithm is then clearly the force-directed method, but the placement approach is the general representation of the blocks which is employed while placing using the forcedirected method. Another combination could be to use the force-directed method with a slicing placement representation. Since the representation of a placement has great impact on the performance of a placement algorithm, both in terms of speed as well as solution quality, it is sensible to discuss this in more detail. Otten [66] was among the rst who introduced the notion of oorplanning in the early eighties. Motivated by this concept, researchers have begun to look for special cases which could be applied to digital VLSI circuits, without limiting design freedom in a negative sense. One of the most prominent special cases was the slicing oorplan structure, for which certain intractable problems reduced to polynomially solvable cases. This important property has been the main reason for using the slicing oorplan approach. An efcient oorplanning approach is described by Wong and Liu in [67]. Initially, the slicing oorplan approach was also applied to analog designs. However, it was soon realized that the slicing structure is too restrictive for analog layout [7]. Consequently, a more general placement approach was adopted by members of the analog layout design community. Actually, the most general placement approach of all was initially used for this purpose; blocks were allowed to be placed at arbitrary positions in a 2-dimensional plane. Thus, the representation allowed overlap of blocks. One of the rst works in this respect is due to Jepsen and Gelatt [68]. Subsequent works, which extended and rened the original concept, are due to Sechen [41] and Lampaert [8]. Although the general overlapping placement approach resulted in promising results, fundamental aws of it prevented researchers from building a viable mixed-signal layout generation system for larger designs. Efforts to rene implementations and tune the layout system to improve performance have been and can only be successful up to an extent. Fortunately, a great deal of research effort has been put into the design of efcient non-slicing placement representations. Murata et al. [48] developed one of the rst efcient general placement representations, called the sequence pair structure. Some other relevant works are due to Nakatake et al. [63] who developed the bounded slice-line grid structure. Very recently, the O-tree structure was introduced by Guo et al. [62] and independently by Takahashi [64]. A host of extensions and renements of the original O-tree concept followed rapidly [69, 70]. These representations were soon adopted by others for use in an analog layout generation system [45].

6.2 Effective and Efcient Placement

61

6.2 Effective and Efcient Placement


Computational efciency is of paramount importance in connection with the placement problem, since it is NP-hard. The mandatory use of heuristic methods, typically featured by a massive amount of iterations, to obtain an acceptable solution in a reasonable amount of time, leads to the intuitive thought of using all available information as good as possible, without introducing useless redundancy. The partial phrases as good as possible and useless redundancy will be made explicit in this and succeeding sections. In order to achieve the goal of an effective and efcient placement method, a practical requirement on the abstract representation of a placement is so-called P-admissibility. We say that a solution space of a representation is P-admissible if it satises the following four requirements [48]:

the solution space is nite, every solution is feasible, the mapping of a representation into a placement can be performed in polynomial time (P), the solution space contains an optimal solution (admissible).

The rst requirement is quite weak because niteness can have a near-innite appearance [71]. Requirements two and four are obvious. The third requirement is also quite weak, since polynomial computational complexity includes a linear algorithm, but also an algorithm, where can be a large constant. As a consequence of the rst and the third requirements, we can distinct various representations within the boundaries of P-admissibility. The computational complexity associated with a complete placement representation is a combination of essentially two properties of the representation: 1) solution space size, and 2) computational complexity for computing a specic placement. Since scalability, i.e. the computational behavior of a system as a function of the input instance size, is becoming increasingly more important, the use of asymptotic complexity measures is fully justied. In order to make a proper choice on which type of placement representation to use, it is wise to create an overview of important and relevant representations. The nal choice of placement representation is made based on a trade-off between

computational effort, generality of representation, ease of mathematical manipulation.

Let us rst restate the requirements for an efcient mixed-signal layout generation tool from a conceptual point of view. First of all, it is well-known that matching of both wiring and modules is extremely important in analog circuit layout. Therefore, representations that set restrictions to generation of matching-aware layouts should not be used. Thus, non-slicing placement representations are more suitable. Second, the system should have good scaling properties, which means that the computational complexity should be as low as possible. Moreover, in the light of an optimization algorithm which is going to be employed to compute a (near) optimal solution, preference might be given over a specic type of representation

62

Placement

which can possibly exploit information efciency. Third, to achieve efcient usage of computational power and ultimately obtain a high-quality layout in several respects, better understanding of the mechanisms and parameters that control the overall layout quality is required. Therefore, more insight into the representation, especially with respect to its mathematical properties is of importance. A major benet of identication with known mathematics is the possibility to use a host of existing off-the-shelf techniques and algorithms. Table 6.1 gives an overview of known placement representations. It also shows the size of the associated solution space of each representation. Also, the generality of a representation Table 6.1: An overview of placement representations and their associated solution space size. It is indicated whether or not a representation can represent a non-slicing (NS) oorplan. PE indicates the computational complexity of a single full placement evaluation for a state-of-the-art implementation.
representation at Jepsen-Gelatt Polish expression normalized Polish expression sequence pair (SP) bounded sliceline grid (BSG) ordered tree (O-tree) labeled ordered tree (LOT) B*-tree binary tree corner block list topological relation & orientation NS yes no no yes yes yes yes yes yes yes yes solution space size PE refs. [68] [66] [67]

[48] [63] [62] [64] [69] [72] [73] [40]

is indicated in the column with heading NS (non-slicing). From the above table it is clear that there is a big difference in the size of the solution space of the representations. Although an indication is given regarding slicing properties of the representations, it does not cover all aspects of a exible placement representation. This will be further explained in the next section. When a placement representation is used in an iterative approach, it is of utmost importance that the computation of a placement from an abstract representation is very fast. Also, scalability of the placement evaluation step is a major concern [39]. Therefore, the computational complexity of a single placement evaluation (PE) step is also shown in Table 6.1 in the column headed by PE. Obviously, is the best possible complexity when a from scratch computation is desired. Both SP and BSG have super-linear complexities. However, the given values are based on latest published results. Due to the fact that no proof of optimality is known for both the SP and BSG algorithms, we may conclude that improvement is not impossible. Summarizing, we have two types of representations:

general non-slicing representations which have no layout restrictions, and specic restricted representations which have layout limitations; typically these rep-

6.3 Representation Generality, Flexibility and Sensitivity

63

resentations are very efcient, albeit useful only in cases where such a limitation is allowed. In the context of mixed-signal layout generation, restrictions on the layout form a bottleneck. Thus, general non-slicing representations are preferable.

6.3 Representation Generality, Flexibility and Sensitivity


Normally, generality of a representation refers to the fact whether the placement is slicing or non-slicing. We observe that there are more factors that determine the usefulness of a representation. Two important factors are exibility and sensitivity of a representation. These two terms are explained hereafter. Flexibility refers to the property of representing most, preferably all, of the meaningful solutions. A meaningful solution is dened next. Denition 11 (Meaningful Placement) A feasible placement in which blocks that are constrained to be adjacent, can indeed be placed that way without changing orientation or topology of other blocks, is called a meaningful placement. Clearly, not every feasible placement is meaningful, as feasibility means no overlap, but it does not impose any constraints on proximity of certain blocks. In mixed-signal layouts, the possibility to enforce spatial proximity (or minimum distance) between certain blocks is of utmost importance. Thus, it is of interest to choose a placement representation which holds as many meaningful solutions as possible. Although recent attempts have been focused on nding a representation with a solution space as small as possible, it should be noted that a small size of the solution space does not necessarily imply desirable results. This is a major discrepancy of packing-centric views, i.e. the tightest non-overlapping placement is not always the best placement in a realistic design. For instance, in Figure 6.2(a) a (meaningful) placement is shown which can not be represented by either the O-tree, LOT, or B -tree representation. The underlying reason is that all three representations rely on a Tetris-like2 block dropping procedure. Therefore, a block can never hover in the air. BSG and SP, on the other hand, can represent such a meaningful placement because they rely on relative 2-dimensional information, whereas the O-tree structure is essentially 1-dimensional. Therefore, block can be placed such that it is above . Figure 6.2(b) shows a placement that can be represented by any of the methods mentioned in Table 6.1. Another property which contributes to the usefulness of a representation is the sensitivity of a placement to (small) non-topological3 changes. For example, in the case of soft-blocks, changing the aspect ratio of a block might dramatically change the positions of several other blocks. Rotation of a block might have the same effect in a very sensitive placement representation. Figure 6.3(a) shows an illustrative example of the large sensitivity of a labeled ordered tree (LOT) placement. If the width of block is decreased a bit such that the width of block is at least the sum of the widths of block and block , block falls down to the bottom boundary of the chip area, which is an inherent property of LOT placement (and
2 The 3A

popular computer game in which blocks are dropped down. topological change is a change in relative relationships between blocks.

64

Placement

a c d
(a)

a c d
(b)

Figure 6.2: Two meaningful placements, but (a) cannot be represented using a Otree(-like) representation, while (b) can. similar representations such as O-tree and B -tree). In contrast, a sequence pair placement is shown in Figure 6.3(b). Clearly, the sequence pair representation is much less sensitive to small non-topological changes since the location of block is independent or at most linearly dependent on the dimensions of block .

(a)

(b)

Figure 6.3: The sensitivity of (a) a labeled ordered tree placement is clearly much larger than that of (b) a sequence pair placement. For instance, when module is made a little smaller, module moves to the bottom boundary of the placement area. Summarizing, Table 6.2 gives for the placement representations mentioned in Table 6.1 their major advantages and disadvantages. These results will have a signicant impact on the overall best candidate for placement representation in the context of mixed-signal layout generation. Many of the representations in Table 6.2 have been generalized to placement of rectilinear objects at the cost of increased complexity [74, 75, 76, 77, 78]. Also, range constraints which covers both pre-placed blocks and boundary-constrained blocks have been considered [79, 46, 77, 47]. A few of the representations have been adapted to take into account a very important analog-design-related issue which is called matching [80, 45, 72].

6.3 Representation Generality, Flexibility and Sensitivity

65

Table 6.2: An overview of placement representations with their advantages and disadvantages for mixed-signal layout generation.
representation at Jepsen-Gelatt advantages merging is possible general exible representation Polish expression normalized Polish expression sequence pair conict-free channel routing conict-free channel routing conned solution space general exible representation low sensitivity symmetry bounded sliceline grid general exible representation low sensitivity ordered tree non-slicing placement linear-time placement comp. compl. labeled ordered tree non-slicing placement linear-time placement comp. compl. B*-tree non-slicing representation linear-time placement comp. compl. binary tree efcient no fundamental improvement over underlying representation structure use of known tree-based algorithms corner block list non-slicing representation linear-time placement comp. compl. topological relation and orientation non-slicing representation cannot handle regular structures empty space is not represented extremely large solution space solution space includes infeasible placements large sensitivity very sensitive to small changes very sensitive to small changes quadratic placement comp. compl. larger solution space super-linear placement comp. compl. restricted slicing placement restricted slicing placement disadvantages unbounded solution space

66

Placement

Generally, if adding constraints reduces solution space size, this occurs at the cost of increased single placement computation effort. The nal decision on which placement representation suits us best is based on our requirements, which in order of decreasing importance are:

maximal generality and exibility in order to t mixed-signal and analog issues; low computational complexity to evaluate a placement; a small change in the abstract representation is associated with a small change in placement, essentially implying that the cost landscape is smooth; a small solution space so that searching for a good solution can be done more efciently.

The previous discussion justies the choice of the sequence pair representation for use in a mixed-signal layout generation framework. Details on this representation will be given hereafter.

6.4 Sequence Pair Representation


The sequence pair (SP), recently introduced by Murata et al. [48], can efciently represent any (topological) placement of rectangular modules, mainly because of its general non-slicing structure and its inherent property of representing meaningful solutions. To facilitate understanding of the abstract notion of the sequence pair representation, Figure 6.4 shows a few conceptual aspects of a placement of rectangular blocks in relation with sequence pair properties. The placement or layout space is the 2-dimensional plane spanned by the - and -axis. The -axis naturally corresponds the horizontal direction and the -axis corresponds to the vertical direction. The sequence-pair space, spanned by the and axis, is a grid space

-axis

above

left of

right of

below

-axis
Figure 6.4: Several conceptual aspects are shown of placement of rectangular modules in the plane space in connection with relative relationships/directions in the sequence pair space . where the grid size is , with the number of modules in the placement space. Furthermore, we dene four disjunct directions, which we call above, below, left of and right of. The rst two directions align with the vertical axis, and the last two directions

-a xi s

s xi -a

6.4 Sequence Pair Representation

67

align with the horizontal axis. For reasons which will become clear shortly, each direction also corresponds to a two-character identier. It is intuitively clear that the grid space only represents relative information between modules. With an additional step, absolute information can be added. Combined, this is sufcient for general placement representation. Now the concept of a packing will be described. Denition 12 (Packing) A packing is a minimum-area feasible placement of rectangular modules associated with a given SP. A packing essentially adds absolute information to the relative representation of the SP. However, there are still a few degrees of freedom left (unexploited) within a packing. Therefore, the Left-Down packing is dened. Denition 13 (LD-Packing) An LD-packing is a packing in which each module is moved left and down as much as possible while preserving the topology dictated by the sequence pair. Except when noted otherwise, each packing is an LD-packing in the remainder of this chapter. We will explain the notion of sequence pair rst by an example and then more formally. An SP consists of two ordered sequences (or permutations) ,

and

where the sequence elements are unique integers from . These integers are identiers of the modules in the placement problem. Wherever convenient, we synonymously use to denote a module. The sequences can be seen as two orthogonal axes that span a 2-dimensional grid-space. An example is shown in Figure 6.5. The ordering of

0
1 2 3 4 5 6 7 8 9 5

6 9 8 7 6 7 9 4 3 0 1 2 0 5

Figure 6.5: Visual representation of SP .

the elements (modules) in both sequences determines the relative relationships between these elements. For each pair of elements we have a before/after relationship within each sequence.

68

Placement

The combination of two sequences yields four relative relationships between module pairs: after, before, below, above. We say that a module is after (or right of) when is located after in both sequences and . A module is before (or left of) module if it is located before in both sequences and . If a module is after module in sequence and before module in sequence , then we say that is below . Similarly, if is before in and after in then we say that is above . This is also clear from the visual representation of an example sequence pair shown in Figure 6.5. For example, module 1 is after module 5, and module 4 is above module 6. From and four sets ( ) can be derived, called the sets4 , which dene the topological relationships right of, below, above, and left of, respectively. The denition of these sets is:

(6.1) (6.2) (6.3) (6.4)

where is the union operator and is the dissection operator. By we denote the element on position of sequence , with a count index starting at 0. By we mean the index of element in sequence . If the upper index value is lower than the lower index value, the union operator gives . Furthermore, we dene . For example, if then and . If, in addition, then . It can be easily seen that the sets contain a lot of redundant information. For example in Figure 6.5, the set tells that elements 8 and 1 are right of element 4. Since the set already tells us that element 1 is right of 8, it is unnecessary to record this information once again in because all relative relations are transitive. In other words, if 1 is right of 8 and 8 is right of 4, this implies that 1 is right of 4. As the sets information is stored for computation or later retrieval [48], it is more efcient to nd a less redundant description. We introduce sets which are derived from the sets as follows:

(6.5) (6.6) (6.7) (6.8)

In essence, the sets are derived from the sets by removing all (redundant) transitive information. If we leave out the index then . represents any of the sets given in (6.5) to (6.8); represents either of the sets given by (6.5) and (6.7); are dened analogously. For simplicity, we will use instead of if no
4 The

subscripts denote a combination of after and before.

6.4 Sequence Pair Representation

69

confusion is possible. Note that the sets and sets are symmetrically related. This also applies to the and sets [48]. Formally this is written as:

(6.9) (6.10)

where . Thus all (relative) topological information is available in two orthogonal (non-symmetrical) sets; for instance the and sets. However, for practical applications it is very useful to maintain all sets, for instance to improve run-time performance. Note that the sets maintain local topological information, whereas the have a global character. An interesting property of the sets is that they are necessary and sufcient to calculate a packing based on constraint graphs [81], under the assumption that we have no a priori knowledge on the sizes of the modules. In practice, we do have this knowledge, but it will turn out that a complete dissection of relative and absolute placement computation is advantageous for incremental computation. This will be explained further on. Note that we are able to represent any packing of rectangular modules with the SP [48, 10], because we can nd a sequence pair for every packing. Another advantageous property of the sequence pair is that it can be uniquely visualized in two dimensions by an oblique grid representation. The -45 degree axis represents sequence and the +45 axis represents sequence . Figure 6.6 shows an example of an SP and its oblique grid representation, denoted by grid hereafter. Furthermore, each module has

0
1 2 3 4 5 6 7 8 9 5

6 9 8 7 6 7 9 4 3 0 1 2 0 5

Figure 6.6: Oblique

grid

representation

of

sequence

pair

four so-called views which uniquely correspond to the , , and sets. For example, the view of module 5 is the shaded area in Figure 6.6; . It is clear that every set is a subset of its corresponding set. For example, . Currently, two approaches exist to compute a packing from a sequence pair and a set of modules. The rst approach, the graph-based method discussed in Section 6.5, is based on constraint graphs which contain the topological information given by a sequence pair. Within each constraint graph a longest path is sought. The longest path length that is found

70

Placement

corresponds to the width of a packing (for the horizontal constraint graph) and the height of a packing (for the vertical constraint graph). The second approach, which can be classied as a non-graph-based method, is discussed in Section 6.6. This approach is based on socalled longest common subsequence (LCS) computation [82]. The classical LCS problem is generalized into a maximum-weight common subsequence problem by Tang et al. [83]. A subsequence of a sequence is an ordered subset of the sequence elements in which the original relative order is preserved and adjacent subsequence elements need not be adjacent in the original sequence. We will give an overview of existing material on this topic and establish a few new links in this context, leading to a new lower bound on the computational complexity for computing a packing from scratch. Although the graph-based packing approach does not yield the most efcient packing computation technique in terms of computational complexity, it will yield convenient means to step into an incremental packing computation approach which is described in detail in Sec tion 6.7. The computational complexity of the incremental approach is . The computational complexity of the graph-based packing approach is , on average, and in the worst case. The best known average-case and worst-case computational complexity for non-graph-based packing computation is . Furthermore, in Section 6.11 constrained placement computation is described in detail. The basic idea is to impose spatial constraints on specic modules. For instance, a module can be forced to be placed at the right boundary of the chip area. Details will be given in the following sections.

6.5 Graph-Based Packing Computation


An advantage of packing computation based on graphs is that the eld of graph algorithms and analysis methods has been explored thoroughly in mathematics and computer science. Therefore, we are supported by a readily available set of tools which might facilitate the analysis and design of more efcient placement computation methods in the context of mixedsignal layout generation. A striking feature of the sequence-pair representation is that it separates the relative and the absolute placement information. Thus, these two issues can be handled and eventually optimized independently in an algorithmic sense, yielding a 2-step approach.

6.5.1 Relative Placement Computation


As mentioned earlier, the sets are obtained from the sets by the observation that there is redundancy in the latter sets. Since the number and topology of the edges in the constraint graphs is one-to-one related to the sets, and computing a packing by means of longest paths computations in a directed acyclic graph (DAG) requires edges nodes [34], it is important to have small sets. The natural question arises whether or not these sets can be further pruned. This is not the case as stated by the following theorem. Theorem 2 All sets [from (6.5) to (6.8)], e.g. the sets, are necessary and sufcient to compute a longest path through the horizontal or vertical constraint graph associated with an SP, with dynamically changeable module sizes.

6.5 Graph-Based Packing Computation

71

Proof It is easy to see that the transitive edges induced by the constraint graphs constructed from the sets are redundant, since we a looking for longest paths. Only these redundant transitive edges are removed in the sets. Thus, sufciency follows. To prove that the sets are necessary, suppose that they are not necessary and we can do with less. Every element can have its weight increased to make it part of the longest path. Furthermore, the number of times an element occurs in an set is equal to the number of unique paths that include this element. Thus if we remove an arbitrary element from a subset of , then the path containing this element can no longer exist. This path could, of course, be a longest path by proper adjustment of certain weights. This is a contradiction. Thus necessity follows. As a consequence, we can efciently map an SP to its corresponding horizontal and vertical constraint graphs using the sets and the sets, respectively. These constraint graphs only represent the relative relationships. In order to obtain absolute placement information, i.e. all coordinates of all blocks, longest paths computations need to be performed. First we discuss how the relative placement information is computed, that is, we propose an algorithm to compute the sets. After the way this algorithm approaches the problem, it is named the Direct View (DV) algorithm [81]. A few denitions to clarify some terminology are in place. For ease of discussion, the oblique grid is rotated 45 degrees clockwise. Moreover, we associate a quadrant, relative to a module, with each of four possible directions; quadrant 1 up, quadrant 2 left, quadrant 3 down, and quadrant 4 right. Denition 14 (Direct View) A module is said to have a direct view on module in a specic direction, if and only if is in the associated quadrant of and there is no other module in the rectangle spanned by and . Module is called the directly viewed module (or simply viewed module if no ambiguity is possible), and is called the viewing module, denoted by . For example in Figure 6.7, modules 1 and 4 are directly viewed by module 3 when we look to the right. So module 3 has exactly two modules in its direct view to the right. Note that these viewed sets are exactly the sets, and that every viewed set of size induces edges in the corresponding constraint graph. Now that we have shown, in the form of Theorem 2, that the information contained in the sets is necessary and sufcient to represent all relative relationships between modules in a sequence pair context5, it is interesting to investigate the exact size of these sets. Since the size of the sets directly depends on the sequence pair at hand, we need te make an assumption in this respect. Intuitively, it is plausible that during the initial phase of stochastic optimization, no preference is given for a specic type of sequence pair. Therefore, the assumption that random sequence pairs are generated initially is perfectly valid. However, one may object to this assumption and pose that during the nal phase of optimization, the optimization algorithm, which is simulated annealing in our case, converges to a specic sequence pair that may be some kind of worst-case sequence pair. Albeit imaginable, there is no clear reason why a nal sequence pair should exhibit worst-case behavior. Indeed, our experiments in Section 6.9.2 unambiguously show that nal sequence pairs exhibit averagecase behavior. Even in the case where additional constraints are imposed on placement, there is no reason to assume some kind of adverse correlation between the structure of the constraint graphs and the quality of a placement.
5 We

do not use a priori knowledge on the module sizes.

72

0 1 2 3 4 5 6 7 8 9 10 11 12 14 12 5 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 4 6 13 2 9 3 1 7 10 11 0

Placement

14

13

Figure 6.7: Rotated oblique grid representation of SP = ((0, 11, 7, 10, 3, 1, 6, 13, 2, 9, 4, 14, 12, 5, 8 ) , (14, 6, 5, 3, 4, 1, 7, 11, 0, 10, 13, 2, 9, 8, 12 )). The previous discussion justies to take a random grid distribution for analysis purposes, or more exactly, a randomly selected sequence pair from the sequence pair solution space with all elements being equiprobable. To facilitate the analysis, we dene the following. Denition 15 A subsequence of a sequence is an ordered subset of the elements of , where the ordering is with respect to the element positions in . Furthermore, we observe the following. Theorem 3 Each common subsequence of a sequence pair ( , ) is equivalent to a unique strictly increasing subsequence of sequence , which is a unique permutation of . Even so, each strictly increasing subsequence in corresponds to a unique common subsequence in ( , ). Proof A common subsequence of ( , ), where denotes the size of , implies that is both a subsequence of as well as . By construction of the constraint graph, each common subsequence is equivalent to a path through the constraint , which maps the modules graph (from left to right). Dene a relabeling function in such a way that is a strictly increasing sequence. Since is a subsequence is a strictly increasing subsequence of . And since is also a subseof , is also a subsequence of quence of , . Now choose . Since is a strictly increasing sequence, it is clear (from an oblique grid visualization) that a path can only exist in the constraint graph if and only if the nodes on the path occur in strictly increasing order in . Thus the nodes on the path are also in a common subsequence of ( , ), which is easily written as ( , ) using , with .

6.5 Graph-Based Packing Computation

73

Corollary 1 We can analyze properties of sequence pair ( , ) indirectly by using the simpler singlesequence approach. Denition 16 A maximal increasing subsequence of sequence is a subsequence of which can not be enlarged by adding elements from without violating the monotonicity property. With this denition we arrive directly at the following denition. Denition 17 A longest increasing subsequence of sequence is maximum-cardinality subsequence over all maximal increasing subsequences of . Consequently, a longest increasing subsequence of is always maximal, but not vice versa. Let us denote a maximal increasing subsequence by , and its size by . We state a theorem taken from [84]. Theorem 4 Given a random sequence of length , which is a permutation of distinct integers. The expected length of the longest increasing subsequence is asymptotically . Recapitulating, we want to compute the expected number of edges in the sparsied constraint graphs associated with the sets, under the assumption of uniformly random sequence pair selection. Let us consider the sets (associated with the horizontal constraint graph). The other sets can be treated similarly. Each pair is a directed edge in the constraint graph. So the number of outgoing edges from a node is equal to . What we want to determine is the total number of edges in the constraint graph. As mentioned before, depends on the actual distribution of the modules in the grid, also called a pattern. The average number of edges is denoted by , while the maximum number of edges is denoted by . Note that the average and maximum are taken over all possible grid patterns. Moreover, note that a pattern is equivalent to a permutation (Corollary 1), and that the set of patterns is the set of permutations of elements. Hence, the total number of patterns is if we disregard the node labels. It can be easily veried that the maximum number of edges is obtained with a scenario such as shown in Figure 6.8, which consists of two columns of nodes consisting of nodes each ( is even, without loss of generality). Thus, the grid pattern has

(6.11)

edges. Furthermore, if all nodes are vertically lined up in a single column, then the following holds: where is the set of all grid patterns. This implies

. Thus,
(6.12)

(6.13)

For ease of explanation, we dene what we mean by a string.

74

Placement

... . ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..... ..... .. ... ..... ... ... ... ... ... ... ... ... . ... ... . ... ... ... . ... ... . .. ...

Figure 6.8: A simple worst-case pattern in a grid. Denition 18 A string is a closed subsequence of a sequence, which is uniquely dened by two elements in the sequence to denote the start and end of the string, respectively. Determining is done in the following simplied way, using Corollary 1. For the example shown in Figure 6.7 we can use a simple linear-time algorithm to dene a mapping that transforms into an increasing sequence . If the same mapping is applied to sequence we obtain the permutation

0 1 2 3 4 5 6 7 8 9 10 11 12 11

,
,

which is shown in Figure 6.9. In this permutation we are looking for strings

0 1 2 3 4 5 6 7 8 9 10

12 13 14 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

14

13

Figure 6.9: The grid representation of Figure 6.7 can be relabeled in linear time to the above pattern, and this pattern can be described uniquely with a single sequence or permutation. and the numbers between and

are smaller than or larger than . Formally:

(6.14)

6.5 Graph-Based Packing Computation

75

This is equivalent to the notion that the rectangle induced by elements and is empty. In other words, sees , or equivalently, is an edge in the sparsied constraint graph. Consider a random pattern in an grid. It is easy to see that the number of strings of length is exactly . The probability that a string complies to (6.14) is equal to the probability that and are two consecutive elements of the string set. Let us denote this . For example, if the string is then elements 4 and probability by 5 are two consecutive elements and the string complies to (6.14). If the string is then elements 4 and 10 are two consecutive elements. The number of ordered pairs of consecutive elements from a set is exactly . Furthermore, the total number of pairs is . So

where is the length of string minus 1. The expected (or average) number of edges in an as: After rewriting (6.15) with the identity [51]

grid is now simply computed


(6.15)


where

(6.16)

is Eulers constant, this gives the closed-form expression

(6.17)

which states the average number of edges in a constraint graph explicitly. We state this result in a theorem. Theorem 5 The expected number of edges in a constraint graph is equal to 6 , if each sequence pair is equiprobable.

which is essentially . This result stimulates us to search for an algorithm which performs about of work per node, resulting in an overall (average) computational complexity of for all nodes in a constraint graph.
6 For simplicity, but without loss of generality, we disregard the edges coming from the source node and going to the target node, wherever convenient.

Theorem 5 implies that no graph-based algorithm exists which has average computational complexity lower than for computing the mapping from sequence pair to a packing (from scratch), under the assumption that all sequence pairs are equally likely. It follows directly from (6.17) that the average number of edges per node in the constraint graph is (6.18)

76

Placement

6.5.2 An Efcient Relative Placement Algorithm


Motivated by the previous theoretical results, it is interesting to investigate the existence of an algorithm that can compute the sets in an efcient manner. That is, an algorithm that has computational complexity equal or very close to the number of nodes and edges in a constraint graph, which has been shown to be , on average. The result of our investigation is the algorithm shown in Figure 6.10. We call it the Direct View (DV) algorithm, after the visual interpretation of the grid points that are visible from a given node, in accordance with Denition 14.
Input: sequence pair Output: sets which capture the relative relationships associated with

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

/* right to left scan of the grid */ Find Bracket Pairs Initialize BP POS for to step do

closest below while do if then Update BP POS successor od if then Traces OBP Add if then Traces OBP Del

od /* left to right scan of the grid */ /* create sets from the sets */

Figure 6.10: The Direct View algorithm in pseudo code. The DV algorithm performs a right-to-left scan and a left-to-right scan of the grid to gather enough information to construct the sets. During the scans so-called bracket pairs are used which are dened to be a pair of nodes on adjacent horizontal grid lines. For every bracket pair the second node should lie above or below the rst node for a right-to-left or leftto-right scan, respectively. An opening bracket is associated with the rst node of such a pair, and the closing bracket with the second node. The rst node of a bracket pair is also its identier. For example, in Figure 6.7 the bracket pairs for the right-to-left scan are: [10,7], [1,3], [13,6], [2,13], [9,2], [12,14] and [8,5]. Function Find Bracket Pairs nds all bracket pairs in one vertical scan of the grid, which requires time and space complexity. Function Initialize BP POS holds some accounting information on the closest viewing node associated with a bracket pair. This accounting information can also be obtained in . During the right to left scan of the grid, two trees are maintained which hold the viewed nodes for the next node in the scan. The top-down tree holds all nodes for the view, and the bottom-up tree holds all nodes for the view. On line 8 a conditional statement checks if views ,

6.5 Graph-Based Packing Computation


1 Traces OBP Add 2 begin 3 insert 4 insert 5 closest below 6 while do 7 successor 8 if 9 then successor 10 11 successor 12 od 13 /* Similar procedure for the bottom-up tree */ 14 end

77

Figure 6.11: Subroutine for updating the traces in the top-down search tree and bottom-up search tree after adding a node associated with a bracket pair.
1 Traces OBP Del 2 begin 3 delete 4 delete 5 end

Figure 6.12: Subroutine which deletes a node associated with a bracket pair from the top-down and bottom-up trees. which can be performed in constant time. If the statement is true then the appropriate set is updated. Update BP POS updates the accounting information for this bracket pair which takes constant time. This loop breaks when the leaf node of the trace has been processed. Lines 14 through 19 check whether a bracket pair should be opened or closed or not, and calls the functions to update the top-down tree and bottom-up tree accordingly. Analogously the left-to-right scan is performed. After both scans have been performed, the sets can be constructed. In Figure 6.11 and Figure 6.12 the update routines of the traces in the top-down and bottom-up trees are shown. When the DV algorithm nishes, all sets have been determined, and herewith the constraint graphs are also known. Using these constraint graphs, we show next how to compute the absolute placement information.

6.5.3 Absolute Placement Computation


In order to evaluate the quality of a placement in any sense, we need to have absolute placement information. For example, absolute module information is necessary

to derive exact locations of all pins connected to the modules in order to assess the (global) routing quality, estimate the impact of substrate coupling between modules, determine the total chip area.

78

Placement

Since a sequence pair and the derived constraint graphs or sets do not provide absolute placement information in themselves, an additional mapping step is required to obtain absolute placement information from the graph representation. This required missing information to compute the absolute coordinates of the modules in a packing is directly derived from the module sizes. An efcient way to determine the absolute positions of all modules using the constraint graphs and the module sizes, is by means of the longest paths algorithm. This algorithm effectively determines from a given source node all longest paths distances to all reachable nodes in the constraint graph. Since the constraint graph is directed and acyclic, the longest paths algorithm requires complexity for a constraint graph [34]. This can be written as . With Theorem 5 this leads to the result that the lower bound on the average complexity of computing the absolute module positions, using constraint graphs, is . This result is a substantial improvement when compared with the original algorithm by Murata et al. [48], which has average (and worst-case) time and space complexity , where is the number of modules to be placed. In the following, we will show how to compute the absolute module positions from the predetermined sets and the module sizes, by a simple example. For simplicity, we only discuss the horizontal case. The vertical case is similar. The sets, , uniquely dene the horizontal constraint graph, where all outgoing edges of a node are given by . In order to compute absolute positions, every node is assigned a positive value (weight). This value is, for the the horizontal case, equal to the width of the corresponding module. For example, with the sizes of the modules in this example shown in Ta , ble 6.3, and sequence pair Table 6.3: Sizes of the modules in the example placement with the constraint graph depicted in Figure 6.13 and the packing depicted in Figure 6.15.
0 1 44 55 2 36 82 3 91 90 module 4 5 79 36 56 84 6 58 33 7 35 27 8 28 70 9 28 65

13 49

the weighted constraint graph in Figure 6.13 is obtained. For didactical convenience, two additional nodes are introduced: a start node and an end node (both with zero weight). They serve as start point and end point while walking through the constraint graph from left to right. With the length of an edge equal to the weight of the node inducing that edge, the result of this walk is that for each node, the longest-path distance to that node is recorded with the node. In practice, a very efcient longest paths algorithm can be used to compute these distances. The nal distance values are the coordinates of the bottom-left corner points of the associated modules. Note that the distance recorded with the end node is equal to the width of the chip area. In Figure 6.14, both horizontal and vertical constraint graphs are shown for the example sequence pair. After the longest-path distances have been computed for all nodes in the vertical constraint graph, the coordinates of the modules are known and an actual absolute placement is conceived. Figure 6.15 shows the nal packing. Let us annotate the previous notions in a formal manner. The horizontal constraint graph , and the vertical constraint graph is dened by is dened by

6.5 Graph-Based Packing Computation

79

2 0 0 start 0 5 0 56

36

3 79

91 8 28 1

4 6 56

58

44 end

28 7 35 9 13 0

node distance

end 199

0 0

1 155

2 0

3 36

4 0

5 0

6 56

7 56

8 127

9 91

Figure 6.13: The horizontal constraint graph associated with SP ( )= ((2, 3, 4, 5, 6, 8, 1, 7, 0, 9 ) , (0, 5, 7, 9, 6, 4, 2, 3, 8, 1 )). For clarity, the nodeinduced edge weights are explicitly given. Longest path distances are tabularized below the graph.
end

2 2 3 4 4 start 6 5 8 1

6 end 5

7 9 7 9 0 start 0

(a)

(b)

Figure 6.14: The (a) horizontal and (b) vertical constraint graph associated with the example sequence pair.

, where

80

Placement

2 4

8 5 7 0 9 6

Figure 6.15: The packing of 10 modules with sequence pair and module sizes from Table 6.3. Hence

(6.19)

holds. If an efcient implementation based on an adjacency graph representation [34] is used, the memory requirements and time complexity of the graph operations are for constructing the constraint graphs, where are the edges of the constraint graph. Without loss of generality, we will only consider the horizontal constraint graph hereafter, denoted by if no confusion is likely. Formally, the longest-paths information is described by a longest-paths forest denoted by , associates a module di , where the weight function mension with its corresponding node, and where is a recursive distance function dened by

if is a start node, otherwise.

(6.20)

, or equivalently . The equations given by (6.20) are so-called Bellman-Ford equations. Due to the fact that the constraint graph is directed and acyclic, the set of equations given by (6.20) can be solved uniquely, by performing an ordering step (depth rst search) followed by a [34]. Furthermore, it is clear that relaxation step, both requiring . Summarizing, we proposed a graph-based approach for sequence-pair-to-packing computation which has (approximate) average computational complexity

A node

is a start node if it does not have any incoming edges in

(6.21)

where is the number of modules to be placed. The worst-case computational complexity of the approach is , but this can only occur in rare cases. The proposed algorithm is a signicant improvement over the original (worst-case and average-case) algorithm by Murata et al. [48].

6.6 Non-Graph-Based Packing Computation

81

6.6 Non-Graph-Based Packing Computation


A non-graph-based approach in the context of packing computation was rst proposed by Takahashi [85]. He formulated the packing computation problem as a problem of nding a maximum-weight7 decreasing subsequence in a single sequence. Recently, Tang et al. [83] observed that a longest common subsequence in a sequence pair, is equivalent to a path through the constraint graph. Therefore, a well-known longest common subsequence (LCS) algorithm [82] was employed to tackle the packing computation problem. It is intuitively clear that both approaches must be closely related. Actually, with the help of Theorem 3 and Corollary 1, we can argue that these approaches are essentially equivalent (from an abstract point of view). Figure 6.16 illustrates the previous ideas and shows their relationship.

3 1 5 2 6 4 6 1 3 1 2 5 3 1 4 6 1 3 3 3 5 1 2 4

Sequence pair: ((3,5,1,6),(1,3,5,6)) LCS((3,5,1,6),(1,3,5,6))=(3,5,6) Equivalent single-subsequence (permutation) representation: (3,1,2,4) all weights are 1 Maximum-weight increasing subsequence of permutation (3,1,2,4) is (1,2,4).

Figure 6.16: The relationship between a sequence pair and a single sequence representation is shown. Moreover, it is clear from this simple example that an increasing (decreasing) subsequence in the single sequence representation corresponds uniquely to a horizontal (vertical) path in the constraint graph. Surprisingly enough, the non-graph-based approach allows for a more efcient computation of a packing than the previously proposed graph-based method. The reason for this is that, given xed known element weights, not all edges in the constraint graph are needed for proper longest paths computation. In other words, by exploiting a priori information on the actual node weights, some edges in the graph need not be generated.

6.6.1 Maximum-Weight Common Subsequence (MWCS) Problem


The approach taken by Tang et al. [83, 47], is based on the longest common subsequence (LCS) computation technique [82]. The standard longest common subsequence algorithm assigns unit weight to each sequence element. This is not appropriate in the context of packing computation. Therefore, the original LCS algorithm has been generalized to handle weighted sequence elements. This generalized algorithm solves the maximum-weight common subsequence (MWCS) problem. It can be easily veried that a solution of the MWCS algorithm
7A

maximum-weight sequence, is a sequence that has a maximum sum of the sequence element weights.

82

Placement

corresponds one-to-one to a longest path in the horizontal constraint graph. In [47] a very efcient weighted LCS algorithm is introduced, which is in fact the same algorithm as in [83] but with a more efcient priority queue. Since the sequence elements are taken from a nite set , the Van Emde Boas data structure can be applied successfully here. The MWCS algorithm is given in Figure 6.17. For a detailed explanation of this algorithm we refer to the original paper [83]. From the amortized analysis given in [83] it is clear that
Input: sequence pair and element weights Output: maximum-weight common subsequence
1 2 for to do 3 4 5 predecessor 6 if null then else 7 8 insert 9 successor 10 while null do 11 if then delete else break 12 successor 13 od 14 od 15 return predecessor

Figure 6.17: The maximum-weight common subsequence (MWCS) algorithm. the following theorem must hold for the MWCS algorithm. Theorem 6 The asymptotic complexity of algorithm MWCS is , where is the amortized complexity of the priority queue operations: insert(), delete(), successor(), predecessor(). Proof Obviously the loop from line 2 to line 14 iterates times. Let us denote the computational complexity of each of the queue operations by , where is the rst character of the queue operation name. Then it follows directly that the worst-case computational complexity of all operations, excluding the while loop from line 10 to 13, is equal to . For the while loop, we can perform an amortized analysis which goes as follows. Since each element is inserted exactly once into , the total number of deletions is never more than . Only if an element is deleted, the successor operation on line 12 is executed. Therefore, the amortized computational complexity is . As a result, the overall worst-case computational complexity of algorithm MWCS is , which can also be expressed as . A direct consequence of Theorem 6 is that an asymptotic time complexity and space complexity implementation is possible of algorithm MWCS, using the Van Emde Boas data structure [58] which is featured by worst-case time complexity per queue operation. Note that the complexity values associated with the non-graphbased approach are worst case, as opposed to the average-case complexity of

6.6 Non-Graph-Based Packing Computation

83

the graph-based approach.

6.6.2 Maximum-Weight Monotone Subsequence (MWMS) Problem


As observed by Takahashi [85], a longest path through the constraint graph is equivalent to a maximum-weight increasing or decreasing (sub)sequence (after relabeling). Note that a weighted increasing subsequence, also called an up-sequence, is associated with a path through the horizontal constraint graph, while a weighted decreasing subsequence, also called a down-sequence, is associated with a path through the vertical constraint graph. This observation can be proved easily with the aid of Theorem 3, which essentially states that a sequence pair can be easily mapped to an equivalent single sequence. Formally, the problem can be stated as follows. Problem: Maximum-weight monotone subsequence (MWMS) problem Instance: Solutions: Maximize: A permutation

The set

of the elements in , and a weight function . of all monotone increasing or decreasing subsequences of , with . over .

Since there are no fundamental differences between an increasing instance and a decreasing instance of the MWMS problem, we will simply call it the MWMS problem. Although a clear link has been established between computation of a packing and the maximum-weight monotone subsequence problem [85], incomplete links have been established between the maximum-weight monotone subsequence problem and related works in mathematics and computer science. The elegance and conceptual simplicity of the MWMS problem almost dictates that this problem is known and has been tackled before. Indeed, M kinen [86] surveyed the up-sequence problem and commented on the relationship bea tween the MWMS problem and the maximum-weight clique problem in permutation graphs. It turns out that the maximum-weight clique problem in permutation graphs is equivalent to the MWMS problem. The former had been investigated by Chang and Wang [87] well before the introduction of the sequence pair representation. They proposed efcient algorithms for both the maximum-weight clique and maximum-weight independent set problems on permutation graphs with complexity . Thus, in principle, the rst and fastest known algorithm in terms of computational complexity for non-graph-based placement computation using the sequence pair representation, was left undiscovered just until this moment. For completeness, we will mention the approach taken in [87]. First, we dene a clique. Denition 19 (Clique) A clique in a graph is a complete subgraph of

In Figure 6.18, the set of nodes forms a clique since every node in the set is connected to every other node in the set. In order to nd a maximum-weight (sum of all clique element weights) clique in a permutation graph, Chang and Wang observe that an isomorphic interval graph can be constructed in linear time from a permutation, which is a compact equivalent representation of a permutation graph. The obtained interval graph is

84

Placement

then used to nd a maximum-weight set of weighted intervals with a known algorithm due to Hsu [88] with complexity , where is the number of intervals (and also the size of the permutation). Effectively, a maximum-weight decreasing8 subsequence is obtained in . For didactical purposes, let us consider again the sequence pair

visualized in Figure 6.5. Using Corollary 1, we map this representation to a single sequence representation with the mapping dened by

which turns into a strictly increasing sequence. If this mapping is applied to , we arrive at the single-sequence representation (or permutation)

(6.22)

A permutation graph associated with a permutation is dened by is the set of nodes and is the set of edges dened by

, where
(6.23)

with . In words this means that an edge exists between two nodes and if and only if is larger than and is located before in the permutation , or is smaller than and is located after in permutation . Obviously, a one-to-one relationship exists between the permutation graph and the permutation. It can be veried, using (6.23), that a clique in a permutation graph corresponds uniquely to a strictly decreasing subsequence within the associated permutation. For example, (3,2,0) is a decreasing subsequence of , and , , and . Thus, set forms a clique, as expected. Also, (3,7) is not a decreasing subsequence. Consequently, there should not be an edge between 3 and 7 in the permutation graph . This is indeed . This notion can be easily generalized to the situation the case, as of weighted nodes. In that case, there is a corresponding maximum-weight clique with a maximum-weight decreasing subsequence, and vice-versa.9 The technique proposed in [87] is to map the permutation graph (with permutation given) to a so-called isomorphic interval graph representation for which Hsu [88] presented an algorithm to compute maximum-weight cliques. The crucial point here is the computational complexity of the isomorphic transformation. It is proven in [87] that this transformation can be performed in linear time. The formal transformation is discussed after illustrating the above ideas with an example. For ease of understanding, we use unweighted nodes in the following example. Furthermore, since we want to discuss the horizontal subcase (equivalent to increasing subsequences)
approach for nding a maximum-weight increasing subsequence is similar. we want to apply the ideas to strictly increasing subsequences, we can simply reverse the permutation sequence. Another interesting equivalent problem in connection with sequence reversing is left out of this discussion. The interested reader is referred to [87].
9 If 8 The

6.6 Non-Graph-Based Packing Computation

85

of packing computation and the algorithm given in [87] works by default on decreasing subsequences, we reverse the sequence of (6.22) and get

(6.24)

Figure 6.18(a) shows the permutation graph for permutation . Figure 6.18(b) shows the associated interval graph representation for this permutation graph. The construction of this graph is discussed shortly. The reader can easily verify that each pair of partially overlapping interval segments, say and , in Figure 6.18(b), with each segment associated with an element in , corresponds uniquely to an edge in Figure 6.18(a). Note that we deliberately put the nodes in the graph of Figure 6.18(a) in the same positions as given by the original sequence pair. Comparing this graph with the horizontal constraint graph of Figure 6.13 should directly reveil similarities. It is important to note at this point, that a clique in the permutation graph of Figure 6.18(a) corresponds uniquely to a horizontal path in the constraint graph of Figure 6.13. As discussed before, an increasing subsequence corresponds uniquely to a clique. As a consequence, these notions are fully equivalent. Formally, a given permutation graph, with permutation given, is mapped to an interval graph representation as follows. Each interval is dened as , . Add a super-interval , which is required for Hsus algorithm. If and only if two intervals have partial overlap, i.e. either or , then an edge exists between nodes and in the permutation graph. With the constructed interval graph, the algorithm proposed by Hsu [88] can be used to compute a maximumweight clique in the interval graph in time and space complexity, essentially similar to the approach and results of Tang et al. which was published many years later. However, it must be noted that the algorithm of Tang et al. is conceptually easier to understand. It is posed as an open problem whether or not the MWMS problem can be solved in linear time within our standard model of computation. However, we can derive that (in theory) it is possible to solve the MWCS problem in smaller complexity than which is obtainable through the use of existing practical data structures. In [89] optimal bounds on the predecessor problem are established. The theoretical result of that paper is a new data structure which stores integers from a universe of size in space and performs predecessor queries in

time. In conjunction with Theorem 6, we may conclude that the computational complexity of algorithm MWCS can be improved to

Since a solution of the MWCS problem is also a solution to the MWMS problem, the same achievable computational complexity holds for the latter. Summarizing, we can say that

The non-graph-based placement computation approach is computationally more efcient than a graph-based approach. The former can be practically implemented with complexity, while the latter needs .

86

Placement

0 2

1 5 4

7 9 8

(a) A permutation graph corresponding to permutation .

-10

-9

-8

-7

10

-6

-5

-4

-3

-2 -11

-1

3 4

(b) An interval graph which is isomorphic to the permutation graph of (a).

Figure 6.18: The

permutation

and an isomorphic interval graph.

graph

corresponding

to

permutation

Owing to the fact that some redundancy is incorporated in the graph-based placement computation approach, it can be more easily generalized to an incremental approach. We do not make any claim that this is impossible with the non-graph-based approach. However, it is surely much more difcult. Another advantage is that the division between relative and absolute computation of the graph-based approach yields better (visual) insight into the problem. Consequently, analysis and design of relevant algorithms is substantially facilitated.

From both a theoretical and a practical point of view, it is more interesting to investigate an incremental generalization of the graph-based placement computation approach. From

6.7 Graph-based Incremental Placement Computation

87

theorical analyses, we might nd interesting and exploitable properties that were previously unknown. Also, fundamental links with other approaches might be indirectly established. From practice, we gain important experience on how the incremental approach relates to the non-incremental approach in terms of run-time performance. From this information, practical guide lines can be derived for the usage of the incremental algorithms.

6.7 Graph-based Incremental Placement Computation


In the context of a stochastic optimization framework such as simulated annealing, typically very small changes (perturbations) are applied during each iteration (move generation). It is intuitively clear that a small change in input, is usually associated with a small change in the output. More specically, a small change in the topology of the constraint graphs or the weight of a node, typically causes only a part of the absolute placement to change. It is obvious that recomputing the entire absolute placement after each small change is a waste of computation time. Therefore, it is interesting to nd more efcient means to compute the absolute placement after applying a small change to the abstract sequence pair representation. This efcient manner of updating strictly necessary information is called incremental computation. Note that the incremental computation approach does not involve any approximations and, therefrom, induced errors; it is exact. This is often not the case in contemporary literature. A change in the placement can come about in several ways, induced by the perturbation operator set we have dened within the simulated annealing environment. In our case, essentially, a distinction can be made between the following perturbations:

-swap: the topology of the constraint graph generally changes, and also the longest
paths information must be updated;

swap: the topology of the constraint graph is unaffected, but the longest paths graph must be updated; rotate (over ): there is no change in relative relationships, and the longest paths graph must be considered for an update only if the rotation angle is or ;10 mirror (horizontally or vertically): the constraint and longest paths graphs are unaffected.

Typically, only a small part of a placement is actually affected by a perturbation. This fact can be exploited by an incremental computation approach. Furthermore, as will be clear shortly, the incremental approach is exact, i.e. no error in the placement is introduced. As a consequence, the quality of a placement obtained by incremental techniques is essentially the same as one that is obtained by compute-from-scratch techniques. It is convenient to specify exactly what is meant by an affected module and a moved module in this context.
10 Note that rotation over does not have any inuence on the placement, but that it does inuence the routing because pin positions are changed.

88

Placement

Denition 20 (Affected module) A module is affected if it is an operand during a perturbation, or if its location can be inuenced by that perturbation. Denition 21 (Moved module) A module is called a moved module if its location has actually changed due to a perturbation between consecutive iterations. If a module changes orientation or is moved, generally all nets connected to that module have to be re-routed. As a result, in all of the above perturbation cases, the routing has to be recomputed in some way. Generally, we can state that the more modules are affected, the more routing effort is required. We split the incremental packing computation in two parts. The rst step computes the modied constraint graph in an incremental way. The second step computes the longest paths information in an incremental way.

6.7.1 Incremental Relative Placement Computation


The incremental computation of the constraint graphs is shown for the horizontal case. The vertical case can be treated similarly. For the constraint graph computation we need only consider perturbations -swap or -swap, as they are the only ones that affect the topology of the constraint graph. The swap operation does not change the topology of the constraint graph. The only thing that needs to be done is interchanging the labels of two nodes (and their associated elds). Operations -swap and -swap induce a change in the topology of the constraint graph. And as shown in Figure 6.19 and Figure 6.20 this change in topology is dependent on the type of swap and

Figure 6.19: The direction of rotation associated with an -swap depends on the relative positions of modules and . The nodes are drawn in an oblique grid view.

Figure 6.20: The direction of rotation associated with an -swap depends on the relative positions of modules and . The nodes are drawn in an oblique grid view. the relative orientation of the nodes under consideration. Let us call these two nodes and

6.7 Graph-based Incremental Placement Computation

89

, and let and divide the grid in nine regions denoted by The gridlines belonging to nodes and are not part of these regions.

. Furthermore, for ease of discussion, we will rotate the oblique grid 45 degrees so that we get a grid with horizontal and vertical lines, in which sequence is vertically aligned with the left side (top down) and sequence is horizontally aligned with the bottom side (left to right). Figure 6.21 visualizes this idea. The nodes in these regions are elements of nine disjunct sets denoted by

Figure 6.21: Two nodes and divide the grid in nine regions. . From Figure 6.19 and Figure 6.20 it is clear that there are 4 possible scenarios for computing the new sets after a perturbation operation. In order to construct a fast and efcient algorithm we assume that all elements of an set are stored in order. The order is dened by the position of the element in sequence . This could, for instance, be achieved using an implementation based on balanced binary search trees, such as splay trees [54]. We discuss an example to illustrate the way in which sets are updated. Suppose we have the situation as shown in Figure 6.22. The situation before perturbation -swap is

Figure 6.22: Perturbation -swap is performed on nodes and . The left side is the situation before the perturbation at time . The right side is the situation after the perturbation at time . denoted by , and the situation after -swap is denoted by . The two nodes and are vertically oriented (in the oblique grid) at time and after the -swap at time they are horizontally oriented (in the oblique grid). Assume we want to update the sets and , then the following two cases can occur: 1. 2.


and

, where the algorithm in Figure 6.25 can be applied, , where the algorithm in Figure 6.26 can be applied.

90

Placement

First an example is discussed to illustrate the approach, which is subsequently formalized into an algorithm. Figure 6.23 depicts an illustrative example in which the computation of the set is shown. The time order in which the nodes are processed is indicated by a

1 3 4 5 6 2 1

2 3 4 5 6

Figure 6.23: An example to illustrate the manner in which is constructed according to the algorithm shown in Figure 6.25. number; the node with number 1 is processed rst, and the node with number 6 is processed last. Furthermore, the dotted arrows visualize that a node is directly viewed by another node, and thus is an element of that nodes set. At the left-hand side of Figure 6.23, the situation before the -swap of nodes and is depicted. We start from this scenario to nd the nodes that will be in set after the perturbation, in other words the set. The search for the nodes in is initiated by looking for the rightmost node in region which is not directly viewed by another node in region . Once found, the nodes that are directly viewable by node at time are added from right to left to (which is initially empty). In Figure 6.23, the sequentially added nodes are numbered 2, 4 and 5. These steps deserve some additional explanation. Searching for the rightmost node in region which is not directly viewed by another node in region is easily accomplished as follows. Start looking for the rightmost node in . Assume, without loss of generality, that this node (node 1 in Figure 6.23) lies in region . Clearly, none of the nodes in region can be an element of . Therefore, if the rightmost node that has been found is not in region , iteratively look for the rightmost node in the set of the last found rightmost node , until the newly found node is in region . Suppose this node is (node 2 in Figure 6.23). Search for the node just left of in . Suppose this node is . At this point, we distinct two cases: and . In the latter case, we simpy add to . So assume (node 3 in Figure 6.23). In this case, we set and proceed looking for the node in just left of the last found node in region (node 4 in Figure 6.23). This process is iterated (node 5 in Figure 6.23)) until the node (node 6 in Figure 6.23) does not contain an element in its set which is left of the last added node (node 5). This completes the computation of . Let us proceed with an example in which we want to compute . Assume without loss of generality, that region is non-empty. Figure 6.24 depicts an example scenario. The determination of is performed in two phases. First, all directly viewable nodes in region and are determined. Second, all directly viewable nodes in region are determined. Note that if and only if region is empty, node is an element of . At the

6.7 Graph-based Incremental Placement Computation

91

left-hand side of Figure 6.24, we see that the nodes in regions and which are to be added to , are searched for in the view direction. We start with node and search for the leftmost node right of in (node 1 in Figure 6.24). This node is called the reference node. Now we proceed by using the last found node, say , to nd the leftmost node right of node in (node 2 in Figure 6.24). This step is iterated (node 3 in Figure 6.24) until no such node can be found. The remainder of the elements in

1 2 4 5

4 1 2 5

Figure 6.24: An example to illustrate the manner in which is constructed according to the algorithm shown in Figure 6.25. is located in region . Finding those nodes is straightforward. An strict requirement is that all those nodes should lie above the reference node (node 1 in Figure 6.24). First nd the rightmost node, say , in (node 4 in Figure 6.24). Now nd the rightmost node in which is left of the last found node (node 5 in Figure 6.24). Repeat this step until the previously given requirement is violated or no further nodes can be found. Formally, the algorithm that computes the new sets, for this specic orientation and perturbation sceneario is shown in Figure 6.25. Function nd max searches for the element in with largest position index in sequence . The execution of nd max on line 2 nds the rightmost node in region and (if it exists). If no such element exists, then . In the rst while loop from line 4 to 7, the algorithm efciently searches for the rightmost element in region . If no such element exists then . The second while loop from line 8 to line 19 determines all elements in . If a node is found in region using function nd max on a previously determined reference node , then this node must be in which is accomplished on line 11. Once the rightmost node has been found, the rightmost node which is left of a previously added node to and directly viewable by the reference node , is searched for. Computation of the is performed on lines 20 through 33. On line 22 the algorithm searches for the leftmost node which is right of and in . This implies that this node must be located in region of . Moreover, this node must be an element of which is established on line 25. Figure 6.24 illustrates the search for the nodes in region and which must be included in . However, the construction of is not complete yet as region might contain more nodes that should be in . The determination of these nodes is accomplished by the while loop from line 30 to 33. Node which is dened on line 23, acts as a reference

92 node. All nodes in region must lie above this node and an element of requirement to be part of .
Input: and Output: updated
1 2 3 4 5 6 7 8 9 10

Placement

, which is a

sets; nodes and . and sets.

/* Determine set. */ nd max

while do nd max
od while do if then

11 12 13 14 15 16 17 18 19 od

nd predecessor elsif

then

nd predecessor

20 /* Determine set. */ 21 22 nd successor 23 24 while do

25 26 27 nd successor 28 od 29 nd max 30 while

31 32 33 od

do nd predecessor

Figure 6.25: Incremental update algorithm for updating the sets, . Before perturbation, nodes and are vertically oriented (see Figure 6.22). Construction of the and sets starting from an initially horizontal orientation of and goes according to the algorithm shown in Figure 6.26. With the explanation of the previous algorithm, it is easy to understand the actions of the algorithm in Figure 6.26. Note that for notational convenience a dummy node is introduced, with . Analogously, the other sets can be computed by properly adapting the previously discussed algorithms to the specic situations. It is also possible to use the symmetry relationship given by (6.9) and (6.10) to compute the sets for the other viewing directions. Due to the perturbation of nodes and , nodes in the regions , , , and might

6.7 Graph-based Incremental Placement Computation

93

Input: and Output: updated

sets; nodes and and sets.

6 7 8 nd successor 9 od 10 nd max 11 while do

1 /* Determine set. */ 2 3 nd successor 4 5 while do

12 13 14 od

nd predecessor

15 /* Determine set. */ 16 nd max 17 18 19 while do 20 if 21 then 22 23 nd predecessor 24 elsif 25 then

26 27 28 nd predecessor 29 else 30 break 31 32 od 33 nd predecessor 34 while do

35 36 37 38 od

nd predecessor

Figure 6.26: Incremental update algorithm for updating the sets, . Before perturbation, nodes and are horizontally oriented. have been affected too, in terms of their sets. Therefore, we have to trace down which of these nodes need to update their sets. Fortunately, by virtue of the symmetry relationships (6.9) and (6.10), these nodes can be determined easily. For the discussed scenario, the set is at most

(6.25)

94

Placement

It is not sufcient to use the union of difference sets given by

(6.26)

where set difference is dened by . This is demonstrated by the scenario shown in Figure 6.27. Clearly, updating only the nodes in the set difference would

Figure 6.27: An example showing the insufciency of using set difference of nodes as given by (6.26). At the left side and . At the right side and . ignore node , due to the fact that node prevents node from being in , and node is not in . It is clear that denitely needs updating because node should not be in . Therefore, it is clear that using (6.26) is not adequate. Obviously, the use of (6.25) is sufcient. However, it may be possible to exploit a more clever technique based on the symmetry properties of the sets given by (6.9) and (6.10) which minimally extends the set (6.26) to render it sufcient. This can be intuitively understood by the observation which we derive from the following theorem. Theorem 7 If a non-empty rectangle induced by unperturbed nodes and becomes empty, or vice versa, due to a perturbation, both the set and the require updating. The view direction is from node to node and the opposite view direction is from node to node . Proof By denition, if and only if the rectangle induced by and is empty. Therefore, if a non-empty rectangle becomes empty, or vice, versa, we have a change and consequently the and sets must be updated. A direct consequence of Theorem 7 is the following. Corollary 2 If a previously empty rectangle becomes non-empty or a non-empty rectangle becomes empty, and is the moving node, then and must be updated. The directions and are dened in Theorem 7. We can use Corollary 2 as an aid to identify pairs of nodes such as dened by Theorem 7. In other words, if a node requires an update of its set, we determine node and update its set. Finding can be accomplished with .

6.7 Graph-based Incremental Placement Computation

95

The complete set of unperturbed (static) nodes for which we have to update the and sets must be located in regions , , , . For ease of discussion, let us assume that we have a node in located in region , as shown in Figure 6.28. Without loss of generality, assume nodes and are in region and , respectively, and both nodes are viewed by . After the perturbation, the shaded are must be explored in order to nd nodes which should be included in . With the aid of previously discussed techniques it is quite easy to identify those nodes efciently. The steps to be taken are formalized in the

Figure 6.28: Node in region will be affected by the perturbation of nodes and . The shaded area induced by nodes and may contain nodes which will be viewed by after the perturbation. algorithm shown in Figure 6.29, which efciently computes the set. On line 2, we search for node . If it exists, variable records its position in sequence . If not, the shaded region extends to the lower boundary of the grid and is assigned a very large number. Line 4 determines node . If a node is found in region then, on line 7, we look for a module in s view at time which is left of . If the node is located in region or does not exist then node will be viewed by after the perturbation. In this case, lines 9 and 10 are executed,
Input: and Output: updated

sets; nodes and and sets

1 2 nd predecessor 3 if then else 4 nd successor 5 if 6 then 7 nd predecessor 8 else 9 nd predecessor

10 11 12 while 13 14 15 od

nd predecessor

do

Figure 6.29: An algorithm to compute ; essentially determining any nodes that will be viewed in the shaded region of Figure 6.28.

96

Placement

nding a node in the shaded region which extends to the right of region and in s view at time . Finally, in the loop from line 12 to line 15, we repeatedly add nodes to from s view, if they are above node . This is done in a right to left fashion using the predecessor approach. It is somewhat elaborate but quite straightforward to adapt the algorithm in Figure 6.29 to the other cases.

6.7.2 Incremental Relative Placement Computational Complexity


The computational complexity of the complete incremental update algorithm for computing the new constraint graphs after an -swap or -swap has been applied, is derived next. The analysis is based on the algorithm shown in Figure 6.25, where we assume that the perturbed nodes and at time are vertically oriented. Recall that each set is stored in a separate balanced binary search tree. The balanced binary search tree operation nd max () runs in , where is the number of elements in the tree (see Chapter 5). This complexity also holds for other basic operations on the tree such as: nd min (), nd predecessor (), nd successor (), insert (), delete (). From (6.18), we have for the average size of an set (6.27)

for a randomly picked (with each equiprobable). Therefore, the search will take . If (and only if) , the function returns . The loop from line 4 to line 7 will be iterated at most times due to (6.27). Moreover, the member check on line 4 can be performed in constant time. Consequently the rst while loop has complexity . The second while loop from line 8 to 19 is also iterated at most times due to (6.27). At most elements are added on line 11. Adding elements to an (initially empty) set takes , because

where is some constant. Since nd predecessor () on lines 13 and 17 within this loop takes , the total computational complexity for constructing the updated set is .

(6.28)

Similar arguments hold for the computational complexity of the while loops from lines 24 to 28 and from 30 to 33; both run in . The resulting total computational complexity of the incremental update algorithm to compute the updated sets, , is . The same line of reasoning can be applied to the algorithm of Figure 6.26, which computes the sets, , but where the nodes and are horizontally oriented before perturbation.

6.7 Graph-based Incremental Placement Computation

97

To complete the analysis, the algorithm shown in Figure 6.29 should be included. This algorithm is the basis of the overall approach to compute the updated sets for all . On line 1 the old set is copied to the new set. This takes using (6.28). Operation nd predecessor () and nd successor () on line 2 and 4, respectively, use .The same complexity applies to the operations on lines 7, 9 and 10. The dominant part of the algorithm is the while loop from line 12 to 15. By (6.27), the loop iterates times. Within each iteration we have to add an element to and run nd predecessor (). Thus, by similar arguments as before, the result sums up to total computational complexity.

Since there are nodes for which the aforementioned complexity is required, an upper bound on the average computational complexity is given by

(6.29)

Using (6.25), (6.27), and

, an upper bound for the expression in (6.29) is

(6.30)

Hence, the resulting grand total for the complete incremental constraint graph computation approach is (6.31) Clearly, an absolute lower bound on the average computational complexity is

(6.32)

Note that both complexities in (6.31) and (6.32) cannot be reduced by using the more efcient Van Emde Boas data structure [58]. The reason for this is that the universe of elements . In theory, the aforementioned complexities can within a single set has size be improved but from a pragmatic point of view this is at least impractical since no implementations of theoretically more efcient data structures have been reported as yet.

6.7.3 Incremental Absolute Placement Computation


Computing the absolute information in a placement boils down to computing the longest paths information. This can also be performed in an incremental manner, as will be shown next. In order to do this efciently, the set of affected nodes is needed as an input parameter for the incremental longest paths algorithm. A new algorithm is given here to compute longest paths through a perturbed constraint graph. This part was published before in [9], but in a less extensive form. First, a simple algorithm is given to nd a single strictly increasing sequence from a given sequence pair. 1. Find a unique permutation that maps sequence element-wise to a strictly increasing sequence . 2. Map sequence element-wise to a sequence using .

98

Placement

It can be veried that the following property holds for sequence . Going from left to right through sequence , each sequence item that is smaller than all of its predecessors is a start node. The rst sequence item is a start node by denition. It is easy to map this sequence element in back to the original module number using the inverse mapping . For example, when we take the sequence pair shown in Figure 6.6, the mapping is given by

and the inverse mapping is obtained by reversing the direction of the arrows. When is applied to sequence of the example, the following sequence is obtained for :

The start nodes of this sequence are , which can be found in linear time. Using the inverse mapping it is straightforward to nd that the original start node numbers which can be veried with Figure 6.6. It is clear that the aforementioned are approach also works for sub-sequences, i.e. a permutation of a subset of . This is also the benet of this procedure, since the complexity is linear in the size of sequence 11 . Note does not hold for start nodes of a sub-sequence of . that in general, The new incremental algorithm which computes the longest paths through constraint graph after a perturbation is discussed hereafter. Essentially, the longest-paths forest is made inconsistent after perturbing and the purpose of the incremental algorithm is to recompute (partial) longest paths in order to make consistent again. We dene four types of inconsistencies: 1. under-consistent; applies when the distance value of a node is lower than its consistent value given by (6.20), 2. over-consistent; applies when the distance value of a node is higher than its consistent value given by (6.20), 3. LP-underconsistent; applies when and and is distance-consistent, 4. LP-overconsistent; applies when and

, where .

We also refer to the rst two inconsistencies as distance-inconsistencies, while the last two inconsistencies are also called LP-inconsistencies. Graph is called distance-consistent when all distance values comply to (6.20). A graph is called consistent when it is both distance-consistent and LP-consistent. The incremental longest-paths (ILP) algorithm is given in Figure 6.30 and operates as follows. On lines 1-3 the distance elds of all affected nodes are set to zero so as to force correct computation of their new distance due to (new) incoming edges. The outer loop starting on line 5 and ending on line 29 checks if all candidate nodes given by set have been processed. Each processed candidate node is
11 With current sorting algorithms, the worst-case complexity is increased to when .

and only

6.7 Graph-based Incremental Placement Computation

99

eligible for annotation as a moved module. Consequently, the number of moved modules is at most equal to the total number of candidate nodes. On line 6 the start nodes are found for set using the single sequence approach described earlier. Note that line 6 is executed at least once (with ), and possibly thereafter when the priority queue is empty and . This occurs when the start node(s) propagate(s) changes through the longest paths forest, but not all affected nodes are processed during this update. The inner loop starting at line 7, processes all (distance) inconsistent nodes that are encountered during a single propagation wave. By virtue of the absence of cycles in (and ), an edge is processed at most once. Furthermore, extracting the smallest distance node from on line 8, guarantees that all candidate nodes are made consistent exactly once. The latter is performed on lines 9 through 14. For each node that is made consistent, all its outgoing edges are processed and the corresponding nodes are checked for inconsistency on lines 15 through 23. Each outgoing node of is checked for under-consistency on line 17, and checked for over-consistency on line 18. Note that over-consistency can only occur if of the inconsistent input graph. An inconsistent node will have its distance updated on line 20 and it will be put (back) on the heap with its new distance value for further processing on line 21. Line 22 will tag an inconsistent node so that it will be made consistent in the iteration where it is extracted from the priority queue . Lines 23 through 26 cover the case in which a node is LP-inconsistent. In these cases is added to to re-establish consistency. Below are some brief descriptions of the functions that are used in the incremental longest paths algorithm.

adjust heap inserts in heap if is not an element of the key space of , otherwise it adjusts the value eld associated with if it is smaller than . extract min removes the pair with minimal value eld from the heap and returns the key eld of that pair. recompute dist computes the longest distance from all predecessors of to . update lp pred updates the longest path information from the predecessors of to . insert lp pred

adds node as a longest path predecessor node of node .


using the single-sequence tech-

nd start nodes nds all start nodes from set nique as described previously.

6.7.4 Incremental Absolute Placement Computational Complexity


Consider the (horizontal) constraint graph induced by the sets. The graph does not need to be maintained explicitly. Instead, we can keep a reduced version of , denoted by which contains the minimal longest paths information. A property of is that each node is reachable from a source node. A node is a source node if it does not have any incoming edges in . Ramalingam and Reps [90, 91] have shown that the computational complexity of the incremental single-sink-shortest-paths algorithm for general graphs with positive edge weights

100

Placement

Input: , , , inconsistent Output: consistent


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

foreach od while

do

do nd start nodes while do extract min if then

recompute dist update lp pred


foreach if

do

then

adjust heap
elsif

then

insert lp pred

od od od

Figure 6.30: The incremental longest paths (ILP) algorithm. is bounded by , where is an adaptive parameter that captures the set of vertices with a changed input or output value. Moreover, is the number of vertices of which the input or output value changes, and is equal to plus the number of edges incident on some node in . Because the single-sink-shortest-paths problem is similar to the single-source-longest-paths problem, the algorithm is suitable for incremental computation of longest paths in the constraint graphs. Fortunately, the constraint graphs under consideration are directed acyclic graphs (DAGs) and we know that for this subclass of graphs, the . As a consequence of this property, longest path algorithm runs in we are able to use algorithms that have incremental computational complexity . A practical question remains on the parameter : how does it relate to the problem size? In general, is an unknown parameter; it can only be quantied after the actual computation. However, we know . In the specic case of a constraint graph induced by a sequence pair, we are able to say some things about in a quantitative way, under certain presumptions. Note that the impact of a random perturbation on is of a global nature, as opposed to the impact on the sets which is of a local nature. The underlying reason for this fact is that essentially embodies both relative and absolute information changes, whereas the sets only represent relative information. Analyzing the average in terms of a random

6.7 Graph-based Incremental Placement Computation

101

perturbation is most convenient from a mathematical point of view. The validity of this approach is demonstrated in Section 6.9. Note that we have a bijective mapping between constraint graphs and sequence pairs. As a consequence we can write every property of the constraint graph as a property of the corresponding sequence pair. Furthermore, the sequence pair properties can be analyzed in a simplied way using a single permutation (Corollary 1). We want to quantify the average size of which is the expected number of nodes in a longest paths subtree of the reduced constraint graph . We know that the constraint graphs have the property of being node-weighted, i.e. the outgoing edges of a node all have the same weight determined by the corresponding node. However, to simplify analysis we assume that on average the node weights do not determine the (average) topology of the longest paths subtrees, but the depth values (determined by a depth rst search from the source node(s)) of the vertices do. This statement deserves some additional explanation. A graphical representation of an example packing of 10 modules, its horizontal constraint graph and its associated longest paths subtree is shown in Figure 6.31. Intuitively, the previous assumption is quite reasonable, as the weight of a node does not change the topology of the constraint graph ; only might be affected. On the other hand, a change in depth value of a node (caused by a perturbation) does affect the topology of and therefore is always affected. Figure 6.32 shows the impact of a change in weight of a node. In this example, node 6 (module 6) has its weight (width) increased from 58 to 78. We see directly from Figure 6.32(b) that the topology of the constraint graph does not change compared to Figure 6.31(b). However, note the change in the longest paths subtree ; the longest path edge (3,8) is removed and edge (6,8) has become part of a longest path. This is also clearly visible from the packing in Figure 6.32(a) where modules 8 and 1 are moved to the right to allow module 6 to expand in width. Note that the width of the chip area increases from 199 to 206, as indicated by the distance value of dummy node . Figure 6.33 shows the packing and the associated constraint and longest paths graph after performing an -swap of nodes 4 and 7 on the state represented by Figure 6.31. We see directly from Figure 6.33(a) that the packing is quite substantially affected. Moreover, Figure 6.33(b) shows that the constraint graph and longest paths subtree are both affected. Dene , the size of a subtree in , as the sum of the nodes reachable from node plus one. Then the average subtree size is dened by

(6.33) Each node in a subtree contributes to the total , the number of times it occurs in any subtree. Let us call the total number of occurrences of a node in any subtree, the multiplicity of that node. Thus we can write (6.33) as (6.34) The multiplicity of a node is exactly the number of ancestor nodes of plus one. We

assume without loss of generality that each node has at most one parent. If a node has more than one parent in , this means that there is more than one longest path to this node. If

102

Placement

0 / 36 2 3

36 / 91

3
4

0 / 79 8

127 / 28 155 / 44 1 199

4
0 / 56

56 / 58 6

8 5 7 0 9 6

56 / 35 7 9 0 / 13 0 91 / 28

(a)

(b)

Figure 6.31: (a) A packing of 10 modules and (b) the associated horizontal constraint graphs and longest paths subtree. The modules are drawn as nodes and the arrows denote relative relationships. Moreover, the solid arrows dene the edges in the longest paths subtree . Each node has two associated integers; the value left of the slash symbol (/) denotes the distance of the bottom-left -coordinate of the module relative to the reference point 0, and the value right of the slash symbol denotes the weight (width) of the associated module. this situation occurs frequently on average, the diversity of the module dimensions must be low. In a practical problem instance this is highly unlikely, and thus the probability that a node has more than one parent is negligible.12 So the expected multiplicity of a node is also equivalent to the expected length of a maximal common subsequence of sequence pair . The expectation should be taken for a given element over all possible congurations for a (typical) xed topology of . The latter is done for simplicity but without loss of generality. Note that the average is taken over all possible set elements for a xed topology. Thus, we have

(6.35)

From Theorem 3 we know that a maximal common subsequence is equivalent to a maximal increasing subsequence which is denotes by (see Denition 17). As a consequence, we have (6.36)
12 This

implies that

, on average.

6.7 Graph-based Incremental Placement Computation

103

0 / 36 2 3

36 / 91

3
4

0 / 79 8

134 / 28 162 / 44 1 206

4
0 / 56

56 / 78 6

8 5 7 0 9 6

56 / 35 7 9 0 / 13 0 91 / 28

(a)

(b)

Figure 6.32: (a) The packing of Figure 6.31 after increasing the width of module 6, and (b) the associated (unchanged) horizontal constraint graphs and (changed) longest paths subtree. Thus, given a random sequence pair (with an associated constraint graph), the following can be derived. Using (6.36) we have

With the aid of Theorem 4 and given

this results in

Applying the Euler-Maclaurin summation formula to this nite sum, the approximation

we have

is readily obtained. Finally, again using

So the expected size of a subtree of is . As a consequence, on average affected nodes affected edges .

(6.37)

104

Placement

0 / 36 2 3

36 / 91

2 7

3
8 0 / 35 56 / 58 6 0 / 56 5 4

127 / 28 155 / 44 1 199

8 5 4

114 / 79

6 9

56 / 28 9

0 / 13 0

(a)

(b)

Figure 6.33: (a) The packing of Figure 6.31 after an -swap of nodes 4 and 7, and (b) the associated (changed) horizontal constraint graphs and (changed) longest paths subtree. However, this is the average computational complexity which is needed in the worst case to update after deleting or inserting an edge or changing an edge length, hereafter collectively denoted by edge change [91]. Intuitively it is plausible that efcient incremental algorithms for single-source longest paths in general graphs, such as described in [90, 91, 92], are not most efcient when applied to our restricted problem instances. It turns out that a more efcient approach can be used on the constraint graphs, by exploiting the knowledge that there is a lot of correlation between the longest paths induced by edge changes. Under the assumption that each module is equally probable to be affected, the expected size of a longest-paths subtree rooted at a randomly chosen node is directly obtained by taking the expectation of both sides of (6.34) and combining with (6.37):


Finally, applying (6.35) we obtain

This implies that the expected amount of change in is per affected node. This can be seen as a lower bound for the expected amount of work to be performed for a single affected node. Let us now analyze the incremental longest-paths (ILP) algorithm shown in Figure 6.30. steps. The set of affected nodes is assigned to , The initialization on lines 1-3 takes

(6.38)

6.7 Graph-based Incremental Placement Computation

105

the set of candidate modied nodes, on line 4. The actions on lines 11-13 are only performed for the candidate nodes, for which holds. Taking , yields a too optimistic estimation. Under the reasonable assumption that a longest-paths subtree rooted at any one of the start nodes derived from set , is highly likely to contain other affected nodes, the expected total number of iterations of the while loop line 7 can be at is a better approximated by as a consequence of (6.38). Thus estimation. This also implies that the expected number of times that the while loop at line 5 is executed is , since candidate (and affected) nodes are subtracted from as they are encountered during the recomputation of the longest-paths subtrees. In other words, each time the algorithm executes line 5, is smaller than the previous time. Each invocation of and line 15 explores nodes. Function adjust heap operates within is at most called times as a consequence of the assumption. Furthermore, the average computational complexity of function nd start nodes implemented with splay trees is at . All other operations have complexity. most Summing up these results, an approximation of the average computational complexity of incremental longest-paths algorithm is


lines 1-3


line 5


line 7 line 8 lines 9-14 line 16

line 6


line 18 line 21 line 25

big-O calculus

(6.39)

Note that all and operations are conditional. This completes the analysis. It can be concluded that under reasonable assumptions, the ILP algorithm has near-optimal computational complexity.

6.7.5 Average Incremental Computational Complexity


A swap perturbation does not affect the constraint graph topology, because it is essentially equivalent to two node weight (inter)changes. However, generally the longest . The incremental longest-paths paths graph needs to be updated because algorithm can be used for this purpose, with the initial set of affected nodes . When a module is rotated over an angle of or , the height and the width of a module are interchanged. Therefore, the weights in the respective constraint graphs are changed, resulting in a change of the longest-paths information. The longest paths graph can

106

Placement

be updated using the incremental longest-paths algorithm with the initial affected set equal to . It is clear that the -swap induces most work for the incremental update algorithms. Therefore, the associated computational complexity is an upper bound on the average-case complexity over all perturbations.

6.8 Implementation Considerations


In principle, it is not necessary to construct the constraint graphs explicitly, as the sets represent the nodes and edges in the constraint graph in a bijective manner. However, for sake of modularity and re-usability of code, the entities sets and longest-paths graphs are maintained separately and explicitly. A great advantage of the sets is the symmetry property given by (6.9) and (6.10). Storing the longest-paths information in a separate graph, renders longest-paths information lookup possible in essentially constant time. In terms of complexity this approach does not incur any overhead, but clearly it is a trade-off between space, time, and exibility.

6.9 Experimental Results


It is interesting to evaluate the new algorithms from two points of view. First, to verify the validity of the theory. Second, to compare the performance with known algorithms. Performance is not only measured in execution speed, which of course is very important, but also in terms of scalability. That is, how does the running time increase as a function of the problem instance size? Although both parameters depend on implementation quality of the programs, optimization quality of the compiler, computing hardware platform, et cetera, scalability is easier to evaluate in an absolute sense. It is much more difcult to provide reliable results for comparison with other published results. Therefore, it is advisable to view these results in a relative way, for instance by comparing CPU time and solution quality pair-wise. A quality measure which is commonly used is the percentage of dead or slack space, dened by slack space nal optimizated area total area of modules nal optimized area

(6.40)

The above denition of slack space implies that a 0% slack space packing is an optimal packing. A 50% slack space packing contains the same amount of empty and non-empty space. Larger slack space values are associated with progressively worse packings. Note that an optimal packing is not always a 0% slack space packing. Summarizing, we conduct various experiments to establish experimental evidence of

the correctness of the theoretical analyses for a single random iteration, the practical computational complexity under the assumption of equiprobable sequencepair selection, the validity of the theoretical assumptions in a practical SA optimization environment which employs a large sequence of iterations,

6.9 Experimental Results

107

the efciency of an incremental approach over a conventional approach in a practical simulated annealing environment.

6.9.1 A Single Iteration


In Figure 6.34(a), experimental results are shown for a single iteration of an incremental placement computation. The perturbation is chosen randomly and equiprobably from swap, -swap . Two modules to be swapped are drawn randomly from , independently and uniformly distributed. All values are averaged over 10,000 iterations and the problem instance sizes that were under test range from 20 to 300. The actual behavior of the plotted
x 10
3

Average CPU time per placement computation

12

x 10

1.8

11

1.6

10
1.4

CPU time [s]

9
1.2

0.8

7
0.6

6
0.4

0.2

5
0 50 100 150 200 250 300

50

100

150

200

250

300

# modules

# modules

(a)
1.95 x 10
5

(b)

6.5

x 10

1.9

1.85

5.5

1.8

5
1.75

4.5
1.7

4
1.65

1.6

3.5

1.55

50

100

150

200

250

300

50

100

150

200

250

300

# modules

# modules

(c)

(d)

Figure 6.34: (a) Average CPU time of a placement computation as a function of the number of modules and (b)-(d) manipulated curves. curve is claried by three additional plots shown in Figure 6.34(b)-(d). They are obtained from the original curve by


respectively.

and

108

Placement

If, indeed, then it is expected that is equal to . From Figure 6.34(c) we can see that is quite noisy with a very small positive trend, implying that the original curve follows (6.39) quite well. This is conrmed by Figure 6.34(d), showing a decreasing trend towards zero. Hence, we may conclude that the average computational complexity of the implemented algorithms is near optimal.

6.9.2 Packing Optimization


We compute optimized packings for a series of problem instance sizes with randomly generated modules (within a specied range). Note that these results have been obtained using an implementation of the sequence-pair-to-packing mapping. The used hardware/software conguration is: Intel PIII 800MHz CPU, 512 MByte RAM, SuSE 7.0 Linux OS. The optimization program is executed three times for each of the problem instance sizes in the sequence: 20, 40, 80, 160, 320. Furthermore, a standard MCNC benchmark is used [93]. The benchmark name is ami49 and it consists of 49 modules, 408 nets and 953 pins. For the packing experiment only the number of modules and their sizes is relevant. The experimental results are summarized in Table 6.4. Table 6.4: Experimental packing optimization without routing considerations ( slack space (%) 20 5.346 40 3.521 80 2.695 160 4.132 320 5.890 ami49 2.849 results for best out of three runs, ). CPU time [s]

1.42 E 1 1.61 E 1 1.54 E 3 5.69 E 3 2.10 E 4 6.88 E 2

A plot of the slack space of the optimized packings for three independent runs is shown in Figure 6.35. It is clear from this gure that there is quite some variation among the results of different runs of the same problem instance. Obviously, our implementation of the SA optimization algorithm has problems getting out of local optima when the number of modules is small. This could be explained intuitively by the fact that with a small number of modules, a perturbation easily leads to a relatively large change in the cost value. The net result is a very irregular cost landscape and, therefore, worse convergence properties of the optimization algorithm. Furthermore, we can see from the gure that the amount of slack space increases signicantly as a function of the problem instance size. This phenomenon can be explained by the relatively simple perturbation scheme that is used in our optimization framework. As we only consider relative perturbations with no knowledge of the absolute positions of the modules, many generated moves will affect modules which are spatially far apart. In the current approach there is no way of choosing modules which are relatively close together, even when the optimization is in its nal phase. In other words, we cannot force the SA algorithm to sample the solution space more smoothly when optimization proceeds. Clearly, this unwanted effect will be increasingly more pronounced with increasing . We may conclude that the sampling behavior of our SA optimization scheme will become increasingly

6.9 Experimental Results


Slack space of optimized packings without routing considerations
16 run 1 run 2 run 3 14

109

slack space (%)

12

10

50

100

150

200

250

300

350

Figure 6.35: A plot of the slack space of several packings. more inefcient for problem instances containing more than roughly 50 modules. Note that no tuning was involved for obtaining these results. Figure 6.36 shows the CPU time for a single packing optimization run for a range of problem instances. The plot indicates a super-linear growing trend in CPU time for computing packings as a function of the number of modules in a packing. A closer inspection reveals that the trend is quadratic. One might wonder if there is a direct relationship between the computational complexity of a complete optimization run and the complexity of a single iteration. As will be clear shortly, indeed, a close correlation between these two complexities can be observed.
CPU time for packing optimization without routing considerations
2.5 x 10
4

run 1 run 2 run 3 2

CPU time [s]

1.5

0.5

50

100

150

200

250

300

350

Figure 6.36: CPU time of a complete optimization run as a function of problem instance size for three independent runs per problem instance.

110

Placement

We also verify the validity of equiprobable selection assumption of Theorem 5 by plotting the average longest-paths tree size of the nal optimization result, for a wide range of problem instances. The program is run three times for each problem instance size. The average subtree size as a function of the problem instance size is plotted in Figure 6.37. The plot shows a
Average subtree size as a function of , without routing considerations
14

12

average subtree size

10

run 1 run 2 run 3 0 50 100 150

200

250

300

350

Figure 6.37: The average subtree size in the longest-paths graph as a function of the problem instance size after packing optimization without routing considerations. clear sub-linear trend as a function of . Indeed, the trend is according to which is evident from the plot shown in Figure 6.38, which is the result of dividing the values plotted in Figure 6.37 by .
Exposed trend of average subtree size without routing considerations
0.7

0.65

average subtree size /


0.55 0.5 0.45 0.4

0.6

0.35

run 1 run 2 run 3 0 50 100 150

0.3

Figure 6.38: The exposed Figure 6.37.

200

250

300

350

behavior of the average subtree size plotted in

6.9 Experimental Results

111

Additionally, it is interesting to verify whether these results also hold under differerent circumstances. For instance, when we change the cost function (4.3) to include routing issues. For the moment, we only mention that a sophisticated (global) routing scheme, denoted by SPBH I, is used which will be discussed in detail in Chapter 7. The cost function weights are set to: , and . The obtained results indicate that the average subtree size grows according to a function which lies in between and . Finally, we show in Figure 6.39 the CPU time of incremental packing optimization versus non-incremental packing optimization for a range of randomly generated benchmarks and the largest MCNC benchmark ami49. It is clear from this plot that the incremental
CPU time incremental versus non-incremental packing optimization
10
5

nonincremental incremental

10

CPU time [s]

10

10

10 1 10

10

10

Figure 6.39: The CPU time of incremental and non-incremental packing optimization as a function of the problem instance size without routing considerations ( , ). placement computation approach outperforms the non-incremental placement computation approach starting from about . This means that the incremental approach is practically feasible and leads to increasingly larger improvements as grows larger.

6.9.3 Conclusions
The assumption of equiprobable sequence-pair selection for analyzing the average computational complexities in connection with incremental sequence-pair-to-packing computation is justied. With and without consideration of global routing the average subtree size of a longest-paths graph with nodes lies between and . The previous complexities hold both for a single sequence-pair-to-packing iteration as well as for an actual sequence of iterations within a practical SA optimization run. From the experimental results we can clearly observe that the performance of the optimization framework depends on the size of the problem instance at hand, and is arguably dependent on the quality of the generated perturbations (moves). It is likely that a more sophisticated perturbation scheme which generates (mostly) better moves, in terms of a higher probability of acceptance, will improve

112

Placement

overall performance of the SA optimization framework. Finally, we observed an approximate one-to-one correlation between the complexity of a single sequence-pair-to-packing iteration and the time required for the optimization process to arrive at a nal solution. Since the average computational complexity of a single incremental placement computation iteration is , which is better than the computational complexity of any from scratch placement computation algorithm (either one of , , and ), the overall run time of an incremental SA optimization run must be better for . Indeed, when we compare a small-constant-factor quadratic implementation all with our (unoptimized) incremental implementation, we nd .

6.10 Placement-to-Sequence-Pair Mapping


For several reasons one can imagine that efcient mapping of a given placement of modules to a sequence-pair representation is useful. A few of many possible applications are directly related to the following scenarios.

When a placement is input via a graphical user-interface, one needs to translate it to a sequence pair representation before any automated concepts can be applied to the placement. An actual placement gives exact information on spatially close modules to a specic module, as opposed to the relative sequence pair representation or (equivalent) oblique grid representation. Therefore, the actual placement is better suited for use in connection with choosing distant or close modules which is useful for selecting perturbation types in a sophisticated implementation of simulated annealing. As will be demonstrated in Chapter 8, Section 8.7, efcient enumeration of all modules in a placement is very useful for the minimization of performance-degrading physical coupling phenomena. We argue that efcient enumeration is a key ingredient for placement-to-sequence-pair computation.

In [48] a method called gridding was introduced to map a packing to an equivalent sequence pair (SP) representation. However, the described approach is quite ambiguous, in the sense that the packing can be mapped to more than one SP. Furthermore, if modules are placed with cutting zones of slack space such as in Figure 6.40, it is impossible to apply Muratas gridding procedure. As argued in [94] it is possible to determine a sequence pair from a given packing by the following procedure. To determine sequence , push the modules out of the packing in an top-left order without having to move aside any other module. For sequence the same push-out procedure can be applied except that it should occur in a bottom-left order now. It is known that this procedure does not always yield a unique solution which can be easily seen from the example cases in Figure 6.41. We note a serious shortcoming in the previous push-out algorithm, being the fact that not all placements can be mapped to the original (or equivalent to the original) sequence pair [94], and, as a result, establish a new observation which is called the idempotent property of the placementto-sequence-pair mapping. Formally, if we denote the mapping from sequence pair to a packing by , and denote the mapping from a packing to a sequence pair by , then

6.10 Placement-to-Sequence-Pair Mapping

113

4 8 5 7 0 9 6 1

Figure 6.40: The


1 2

packing

of

10

modules

corresponding .

to

SP

ambiguous

un-ambiguous 1

((1, 2, 3), (2, 1, 3)) ((1, 2, 3), (2, 3, 1)) (a)

((1, 2, 3), (2, 1, 3)) (b)

Figure 6.41: The mapping from packing to SP is not always unique and depends on the actual sizes of the modules. is said to establish an idempotent mapping, or is simply called idempotent if

(6.41)

Due to the fact that many sequence pairs can map to exactly the same packing (depending on the sizes of the modules), is not unique. In [94] an attempt is made to formalize this idea under the ambiguously dened notion of 1-dimensional compaction. We circumvent the non-uniqueness of in this discussion by stating a natural requirement for . In words, (6.41) means that when a packing is re-computed for a sequence pair which is obtained by applying mapping to the originally computed packing, then these two packings should be equal in every sense. The procedure for proposed in [94] does not guarantee an idempotent mapping as dened by (6.41). This can be easily seen from the packing in Figure 6.42 which is the result of that procedure applied to the packing of Figure 6.40, yielding sequence . The discrepancy is caused by the pair cutting zone of slack space which isolates a module or group of modules from its left or lower neighbor module while these modules could be shifted leftward or downward without incurring overlap13. At least two ways exist to resolve the problem.

Remove cutting zones of slack space by (virtually) enlarging the size of specic modules.
that the relative relationships dictated by the sequence pair are violated by doing so.

13 Note

114

Placement

4 8 5 7 0 9 6 1

Figure 6.42: The method proposed in [94] to compute a sequence pair from a packing does not yield an idempotent , which is shown clearly by this placement where the previous location of a shifted module is drawn in a darker shade.

Adapt the naive packing-to-sequence-pair push-out method so that it takes cutting zones of slack space into account.

First, we formalize the packing-to-sequence-pair (P2SP) method into an algorithm. With a slightly adapted version of the area enumeration operation of the corner stitching data structure (see Chapter 5), it is possible to realize the packing-to-sequence-pair algorithm. The aforementioned algorithm, shown in Figure 6.43, has computational complexity,
1. 2. Enumerate all left-side modules in the packing using corner stitches and push them into in order of occurrence from top to bottom. 3. Extract the topmost module from and push out . 4. Recursively push out the topmost right-side module, say , of .

5. If no more module can be found for pushing out and is non-empty, then go to step 3, else stop.

. If the possibility occurs to push out a right-side and a bottom-side module, the rightside module has priority over the bottom-side module. If a bottom-side module is pushed out then . . .
.

. .

Figure 6.43: Algorithm packing-to-sequence-pair (P2SP) which essentially enumerates the elements of sequence (and ) in an efcient manner and in a specic order. where is the number of modules in the packing. This can be easily understood as follows. Step 2 enumerates all modules at the left boundary of the chip area. In the worst case, this takes , and on average this is . During step 3, the modules are extracted in a rst-in-rst-out manner. Hence the total computational complexity of extracting all modules of takes in the worst case. The most time-consuming operation of step 4 is the traversal of all neighboring modules at the right side of a given module. As neighbor enumeration is performed once for every pushed-out module, and the computational complexity (with hint) for neighbor enumeration is , the amortized computational complexity of step 4 is when implemented efciently. As a result, the overall (average) computational complexity is . The determination of sequence can be performed in a similar

6.10 Placement-to-Sequence-Pair Mapping

115

manner, except now the modules are pushed out starting from the bottom-left corner. We will propose here a method to guarantee an idempotent mapping by virtually expanding specic modules. Therefore, we call this approach the expansion method. For the sake of simplicity, but without loss of generality, the algorithm is discussed for one dimension, i.e. the horizontal expansion case. Expansion uses the steps as shown in Figure 6.44. Note that the placement is assumed to be converted into an equivalent corner-stitching data structure (see Chapter 5). As a result, every piece of space in the packing is represented by an empty rectangle or a non-empty rectangle (module). Clearly, the algorithm has compu1. 2.

3. Check if all touching rectangles, found by neighbor enumeration with hint, to the right of are empty. 5. If not, then expansion is not possible for module (the module is tight). 6. 4. If so, then expand with the horizontally smallest rectangle size.

. .

If

then stop, else go to step 2.

Figure 6.44: An algorithm for expanding modules in one dimension. tational complexity equal to , since the modules are processed in an efcient order given by , and step 3 is performed with amortized computational complexity . Note that an inefcient processing order of the modules could easily result in complexity. This occurs, for instance, when we need to search back and forth in the corner stitching data structure for certain modules. Figure 6.45 shows the packing of 10 modules after (horizontal) expansion, which is the equivalent packing of Figure 6.40. Note that in most cases there

2 4

8 5 7 0 9 6

Figure 6.45: A (horizontally) expanded packing of 10 modules with SP ((2, 3, 4, 5, 6, 8, 1, 7, 0, 9 ), (0, 5, 7, 9, 6, 4, 2, 3, 8, 1 )). will be no slack space left after full expansion of a packing in two dimensions. The reader can easily verify that expansion implies that algorithm P2SP is equivalent to an idempotent packing-to-sequence-pair mapping. The alternative approach, as mentioned earlier, is to guarantee an idempotent packing-tosequence-pair mapping is by rening algorithm P2SP such that ambiguity is prevented from occurring by making explicit use of the empty rectangles. This is not further elaborated in this thesis.

116

Placement

6.11 Constrained Block Placement


In a real-world layout problem, additional constraints are imposed on subsets of blocks that have to be placed. A common situation is the constraint on I/O blocks which must be placed at the periphery of the chip. Moreover, in some cases it might be preferable to place blocks within a pre-specied area of the layout. Especially in the context of mixed-signal layout generation, such constraints are very important. Up to now we have ignored such spatial constraints which are imposed on the blocks in a placement, apart from the sequence-pair constraints. However, in practice they must be dealt with. Of course, the inclusion of such constraints should induce as little computational overhead as possible. Fortunately, the sequence pair approach allows for the incorporation of constraints in an efcient manner. Let us rst specify exactly what type of constraints we have. Denition 22 (Range constraint) A block has a range constraint if a rectangular region is dened for that block.

A constrained block adheres to its range constraint if it lies inside that region, otherwise it violates its range constraint. Denition 23 (Boundary constraint) A block has a boundary constraint if the block is to be placed at one of the four sides of the packing area. A corner constraint is also a boundary constraint. In the former case the module is constrained to two adjacent boundaries. Some authors use a pre-placed module constraint, but this constraint can be seen as either a special case of a range constraint or a boundary constraint. For example, if in case of a range constraint the range is set to the actual width and height of a module, effectively the module is pre-placed. Note that a pre-placed module can also be seen as an obstacle in the placement area [46]. We will discuss hereafter the incorporation of range and boundary constraints into both the non-graph-based placement computation approach and the graph-based placement computation approach. The original idea for the former approach was introduced by Tang and Wong [47] very recently. Also, the incorporation of range and boundary constraints into the incremental graph-based placement computation approach is discussed. Note that matching does not fall under range constraints because a range constraint requires a priori knowledge of absolute placement information. This information is not always available for matched modules. However, as discussed in Chapter 4 matching constraints can be taken into account using techniques such as described in [45].

6.11.1 Non-Graph-Based Constrained Placement


The original idea of sequence-pair-based block placement with constraints in the context of a non-graph-based approach will be sketched shortly. Tang and Wong [47] proposed to use socalled dummy blocks to enforce range and boundary constraints. The dummy blocks have no area but only a length or a width. Figure 6.46 shows this idea graphically. In Figure 6.46(a) block has a range constraint dened by four dummy modules: one to the left of with width

6.11 Constrained Block Placement

117

(a)

(b)

Figure 6.46: (a) Range constraints are enforced by essentially four types of dummy blocks for each placement-constrained block. The dimensions and depend on the (desired) total chip width and height . (b) Boundary constraints are enforced by four types of dummy modules which have dimensions that depend on the (desired) chip dimensions and .

and height 0, one to the right of with width and height 0, one at the bottom side of with height and width 0, and one at the top side of with height and width 0. Note that depends on the height of the chip, and depends on the width of the chip. Actually these values are pre-set desired values and adapted during optimization. In [47] this issue is handled by dening initial values for both and such that equals 150% of the sum of the individual block areas. During optimization by means of simulated annealing both or either one of and are randomly chosen to be decreased by a certain amount when a constraint-violation-free placement is found. Furthermore, the cost function used in [47] is dened as

(6.42)

where and are the actual width and height of the chip area, stands for wire-length, and and are weight factors. Ideally, at the end of the optimization process, and . However, it is clear that during optimization, and frequently occurs. In such a case we have the situation that the dimensions of the current placement do not comply with the desired dimensions and . A straightforward solution would be to penalize such cases with very high cost values. A direct consequence of this measure is that the cost landscape is rendered unnecessarily irregular. This, in turn, badly affects the convergence behavior of the simulated annealing algorithm. Fortunately, there is a more and , we can simply elegant solution. Since by construction we have put and directly in the cost function of (6.42). Minimizing the cost function implies minimizing and . This approach seems to work remarkably well as evidenced by the

118

Placement

results in [47].14 For completeness, and for the sake of overview, we give the general non-graph-based placement computation algorithm of Figure 6.17 again in Figure 6.47. We call it the constrained maximum-weight common subsequence (CMWCS) algorithm. The essential difference is located on lines 12 and 14 which enforce the placement constraints for the horizontal case. The vertical case is similar. Line 12 checks if module has a constraint by verifying (in
Input: sequence pair , element weights and target dimension Output: realized width of the chip area and (partially) constrained () positions of all modules
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

foreach do if then if then if then od

for

to do predecessor if null then else if then if then insert successor while null do if then delete else break successor
predecessor

od od return

Figure 6.47: The non-graph-based placement computation algorithm, which is essentially a constrained maximum-weight common subsequence (CMWCS) algorithm, directly handles range and boundary constraints on blocks. The essential difference with the general algorithm of Figure 6.17 lies in lines 12 to 14 in which the placement constraints are enforced. This algorithm applies to the placement computation for the horizontal direction. , constant time) whether or not it is an element of the set of constrained modules which consists of the range-constrained modules and the boundary-constrained modules . Of course, .15 Line 2 assigns the correct values to left- and right-side constraints associated with a range-constrained module. Lines 3 and 4 establish similar results for left-boundary and right-boundary constrained modules, respectively. If a module is constrained then the position of module is adapted such that the constraint is adhered to. Essentially, the dummy modules associated with the constraints force the module-underconstraint into a certain preferred region of the placement area. At line 12 a left-side con14 Unfortunately, the authors of [47] did not reveil any detailed information on their implementation of the simulated annealing algorithm. Hence, comparison with results of other works should be done with care. 15 Note that the top and right chip boundary locations are unknown beforehand.

6.11 Constrained Block Placement

119

straint on a module is enforced. At line 14 we check again if is a constrained module. This time the width of the total chip area is adapted in such a way that a violation of the right-side constraint will induce larger chip width. Hence, violations are penalized and thus minimized. The overall computational complexity of algorithm CMWCS can be derived in a similar fashion as algorithm MWCS shown in Figure 6.17, being , where is the amortized computational complexity of the priority queue operations. If we use a priority queue based on splay trees we obtain , and if we use a Van Emde Boas data structure [58, 59] a better result, , is obtained. We observe an important point which is worthy of further investigation because it can lead to a simplied overall optimization algorithm and give better placement results. The observation is that the use of stochastic adaptation of the target dimensions adds to the computational complexity of the problem. Moreover, the impact of this stochastic adaptation on the overall performance is not known. Therefore, it is probably better to avoid it. We propose a modied 2-step algorithm to compute a constrained placement without the use of iterative adaptation of and . The algorithm is shown in Figure 6.48. Essentially, the algorithm, which we denote by MWCS2, is the same as the original algorithm except for the fact that the input target dimension is not needed anymore. The algorithm nds this dimension by performing a rst placement pass (from line 1 to 16). The found value of guarantees that no redundant margin is introduced. Consequently, no estimation error is made which will eventually lead to better results in less iterations. Experimental results which conrm this claim are presented next. Note that the boundary constraints can be easily generalized to include corner constraints. This is accomplished by enforcing both a top or bottom constraint and a left or right constraint. Experiments show that neither the run-time performance nor the solution quality deteriorate (under a reasonable number of constraints).

6.11.2 Implementation Considerations


In practice there might be various reasons to choose for a range constraint. For instance, a range constraint can effectively cluster a given set of modules within a prespecied area of the chip. When no intervening modules are allowed between the modules that need to be clustered, the following technique can be used to accomplish this. Choose the range constraint in such a way that the square-like area is about 5% larger than the total area of all separate modules in the cluster. Although the optimization algorithm will attempt to t all modules of the cluster within the specied range, it is not unthinkable that there is a (small) range-constraint violation introduced in the nal placement, for instance, in order to arrive at a smaller total chip area. Although this is a perfectly valid method to achieve effective clustering, the apparent dimensions of the chip area ( , ) will be larger than the actual dimensions ( , ) in case of a constraint violation. This is a problem when the nal placement is computed and compared with other results. Therefore, an additional placement computation step is required which ignores all dummy modules at the top of and right of a constrained module, in order to compute the actual chip dimensions.

120

Placement

Input: sequence pair and element weights Output: realized width of the chip area and (partially) constrained () positions of all modules
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

for

to do predecessor if null then else if then insert successor while null do if then delete else break successor
od

od

predecessor foreach do if then if then if then od

for

to do predecessor if null then else if then if then insert successor while null do if then delete else break successor
predecessor

od od return

Figure 6.48: The modied non-graph-based placement computation algorithm, which is denoted by CMWCS2, does not require the input of a target dimension which is a tunable input parameter.

6.11.3 Experimental Results on Non-Graph-Based Constrained Block Placement


In order to verify the claimed improvement of performance (CMWCS2 versus CMWCS) in Section 6.11.1, we perform extensive experiments with the largest standard oorplanning MCNC benchmark ami49. Its use is de-facto standard in many contemporary placement works. Furthermore, in the context of constrained placement problems, it appears acceptable to disregard all routing issues witnessed by many recent publications. For comparison purposes we will adhere to the same strategy and ignore routing. The ami49 benchmark contains

6.11 Constrained Block Placement

121

49 blocks with a diverse set of dimensions. Similar to the latest state-of-the-art publication on constrained placement [47], we set the following seven constraints. We select a block to be constrained to one of the four boundaries, and do this for each boundary. Moreover, we select three blocks to be constrained within the same preselected placement area. Furthermore, an attempt was made to choose blocks of the same size as in the reference publication but due to a different block labeling scheme there might be some inconsistency here. However, a small inconsistency will not affect the results signicantly. We ran both CMWCS and CMWCS2 20 times on ami49. Since we still have a tuning parameter in CMWCS, embodied by the decrement size of the target dimension(s) each time a constraint-violation-free placement is seen, we compare the CMWCS2 results with the best results from a series of CMWCS runs with different values of the decrement parameter. The set of values we used for this parameter are: 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.98, 0.99 and 0.9999. More specically, when a constraint-violation-free placement , with is found, the chosen target dimension, say , is adjusted as follows: the decrement parameter equal to 0.98. Figure 6.49 shows the behavior of the solution quality over 20 runs of the SA algorithm for various values of the decrement parameter. Both the average and the best solutions in terms of chip area are plotted. For comparison purposes, the average and best chip area obtained by algorithm CMWCS2 is also plotted in the same gure. We can see that algorithm CMWCS2 consistently yields solutions close to the best solution. Moreover, the best CMWCS2 solution is typically better than the best solution over all 220 runs of the CMWCS algorithm. In all cases the average CMWCS solution is signicantly worse than the average CMWCS2 solution. It should be noted that a more honest comparison should compare 220 runs of algorithm CMWCS2 with the best solution obtained with algorithm CMWCS. Indeed, from additional experiments we obtained a value of 36.25 mm after 80 runs which is already better than the best result obtained with algorithm CMWCS. Consequently, we may state that algorihm CMWCS2 is very robust, does not require (problem-dependent) tuning, and yields excellent solutions. However, as is clear from the structure of the algorithms, CMWCS2 in its present form is bound to be signicantly slower than CMWCS. Indeed, this is conrmed by Table 6.5 which summarizes some additional information gathered from the experiments. When compared to the unconstrained Table 6.5: Experimental results of the original constrained placement computation algorithm (CMWCS) and the proposed improved version (CMWCS2). The averages are taken over 20 runs with random seeds.
algorithm CMWCS CMWCS2 average #iterations 564776 718405 average #rejections 364908 469392 average CPU time [s] 89.00 221.56 average CA [mm ] 37.14 36.75 average slack space [%] 4.56 3.55

optimization results (see Table 6.4), there is no apparent degradation in solution quality due to the imposed constraints. This is quite surprising. It implies that the taken approach for including placement constraints does not (signicantly) deteriorate the convergence properties of the overall SA optimization algorithm. The CPU time of CMWCS-based optimization is substantially smaller than CMWCS2-based optimization. This is the only drawback of algorithm CMWCS2. However, it may be possible to improve the latter algorithm by using sophisticated techniques. For instance, one could try avoiding recomputation of module positions which are not affected by constrained modules. This is not further explored in this thesis.

122

Placement

Packing results as a function of decrement parameter value 54

52

50

48 chip area [mm2] CMWCS average CA

46

44

42 CMWCS2 average CA 40 CMWCS2 best CA 38 CMWCS best CA

36 0.2

0.3

0.4

0.5 0.6 0.7 decrement parameter value

0.8

0.9

Figure 6.49: A graph which shows the dependency of the nal chip area of constrained placements obtained with algorithm CMWCS on the decrement parameter value. The problem instance under test is ami49. Each CMWCS value is best or average of 20 runs. The horizontal solid and dashed line are average and best solutions, respectively, of the CMWCS2 algorithm over 20 runs.

The average number of iterations and rejections (of generated moves) gives a good indication of the quality of the overall optimization algorithm and enables platform-independent comparison of optimization times. An important feature which favors CMWCS2 over CMWCS is the fact that the former does not introduce additional and unnecessary tunable parameters which, among others, adversely affect the stochastic properties of the optimization. Additionally, the tunable parameter settings are likely to be problem-dependent. When we spend some more effort on the generation of a few additional constrained placement results, we can for instance obtain with CMWCS2 the placement shown in Figure 6.50. The result is signicantly better than the current state-of-the-art [47] with reported slack space of 6%. It is interesting to note that the latter result is obtained with about 850,000 iterations and 600,000 rejections [95]. If we constrain module 4 to the top-right corner and module 6 to the bottom-right corner, and set the constraints on the other modules as before, a run of the SA optimization algorithm could give the placement result shown in Figure 6.51.

6.11 Constrained Block Placement

123

16

4 33

34 12

9 1 27 20 26 31 21 42 36 8 5 15 48 38 10 46

39

17 35 44

13

14 41 28 6

40

18 45 43 2 30 22

47 25

24 11 3 23 7

32

0 29 37 19

Figure 6.50: A placement with modules 32, 4, 6, and 0 constrained to the boundary (left, top, right and bottom, respectively), and modules 2, 3, and 5 constrained within the same rectangular range indicated by the dotted lines. The chip area is 36.25 mm with 2.22% slack space.
41 43 1 16 32 9 5 37 35 34 40 46 29 3 18 13 2 25 31 39 12 15 22 20 38 26 14 19 11 24 36 44 17 33 45 48 4

7 47 23 0 27 42 10 8

28

21

30

Figure 6.51: A placement with modules 32 and 0 constrained to the left and bottom, respectively. Modules 4 and 6 are constrained to the top-right and bottom-right corner, respectively. Modules 2, 3, and 5 are constrained within the same rectangular range indicated by the dotted lines. The chip area is 36.77 mm and the optimization time is 204 CPU seconds.

6.11.4 Incremental Graph-Based Constrained Placement


The previously discussed range and boundary constraints which can be imposed on any of the blocks in the placement problem, were used in a from scratch computation scenario.

124

Placement

We extend this idea to the incremental computation scenario. As before we perform incremental computations directly on the constraint graph representations. The incorporation of placement constraints into the incremental approach is quite straightforward. For simplicity only the horizontal case is discussed here. A module which has an associated constraint, i.e. , is only processed when it is an affected module. The distance update step for a constrained module is

(6.43)

which is very similar to the recompute dist () function which is used for the incremental computation of longest paths. For the dummy end node in the constraint graph, the following update step is sufcient:


(6.44)

is right-boundary constrained

However, (6.44) can be updated more efciently when the affected range-constrained and right-boundary constrained modules are stored in a separate list which keeps both the old information as well as the new information. As a result, (6.44) can be computed incrementally. Thus the average computational complexity of incremental graph-based constrained place ment is upper bounded by . If the number of constrained modules is xed and independent of the total number of modules , it is easily seen that the placement computational complexity is not affected due to the addition of constraints. Experimental verication of the effectiveness of incremental constrained graph-based placement computation has not been performed. Based on the experimental results of the constrained non-graph-based placement computations, it is expected that the computational complexity is not signicantly affected. It is plausible to assume that previously established properties may be extrapolated to the current situation.

6.12 Concluding Remarks


We have given an overview of state-of-the-art placement representations. These representations have been compared with each other with regard to important mixed-signal layout requirements such as generality and exibility of module placement. Also with respect to the requirement for a well-behaving simulated annealing stochastic optimization process, small sensitivity of a placement representation is argued to be of importance. The sequence pair (SP) placement representation is selected because of its attractive features. A drawback of SP is a comparatively long computation time for a single placement computation iteration, originally where is the number of modules. We have shown how this quadratic complexity can be substantially reduced to using fundamentally different approaches: graph-based algorithms, and weighted longest common subsequence algorithms. Although the graph-based algorithms are inferior in a worst-case scenario, they are very suitable to be generalized into an incremental approach.

6.12 Concluding Remarks

125

Efcient near-optimal incremental placement computation algorithms have been proposed and implemented. Experimental results demonstrated the validity of the theoretical analyses and have shown the feasibility of the incremental approach. However, we note that in a practical framework the usage of both incremental and non-incremental approaches should be considered depending on certain features of the stochastic optimization engine. Since the non-incremental approach is less sophisticated it has smaller constant factors, hidden by the big-Oh notation, as compared to the larger constant factors due to the elaborateness of the incremental algorithms. The difference, however, can be further reduced by optimizing the implementation. We have shown that range and boundary constraints imposed on arbitrary modules can be easily taken into account without incurring computational complexity overhead. A modied algorithm is proposed which is easier to implement, is very robust and consistently yields signicantly better solutions than those in current literature. Furthermore, it is shown how the idea of constrained module placement can be easily transferred to incremental placement computation.

126

Placement

Chapter 7

Routing
This chapter covers several aspects related to routing. The routing process is that part of the physical design step with the task of laying out the interconnect between preplaced geometrical structures, as dened in a point-to-point manner by the circuit netlist. The interconnect consisting of wires and vias of a certain net, is also called the routing of that net. Placement without considering routing in a proper qualitative manner makes only sense in connection with designs where the quality of interconnect has negligible effect on system performance. Normally, these are low-performance designs. Contemporary state-of-the-art mixed-signal integrated circuits require high-quality layouts which are robust in any sense. With increasingly higher operating frequencies into the gigahertz range, and feature sizes going far into the ultra-deep submicron range, routing issues are becoming indisputably dominant. As a consequence, placement should take all quality aspects connected with routing into account. Unfortunately, a routing cannot be computed without having an idea of where the objects that have to be routed are located. But then, how can we nd a good routingaware placement? This problem asks naturally for an iterative approach. See Chapter 4 for a global overview of how placement and routing are integrated into an iterative optimization framework. First, the routing problem is dened exactly. Then, we give a general classication of routing approaches which facilitates categorization of relevant works. This is followed by a brief discussion of relevant previous work. A brief discussion on computational complexity is presented thereafter. We proceed by dening a routing model and routing algorithms which are most promising within a mixed-signal layout generation context. Based on experimental results, we will choose for a routing heuristic that has best performance compared to other heuristics, relative to optimal routing solutions for a broad range of problem instances. The selection criteria are based on both run-time performance and routing quality. Also, emphasis is put on incremental capabilities of the adopted routing methodology consisting of a fast and effective graph-based routing heuristic in combination with an efcient irregular-grid routing model. For the chosen heuristic, we discuss extensions for incremental computations. The incremental routing heuristic then is evaluated and experimental results are reported. Finally, the overall routing methodology is integrated into the iterative optimization framework. Experimental results of the integrated placement and global routing approach are given and compared with existing state-of-the-art works. Furthermore, discrepancies in current works are exposed and discussed. The viability of the adopted methodology is further demonstrated by pinpointing and discussing areas for improvement. Finally, we end with some concluding remarks.

128

Routing

7.1 The Routing Problem


As mentioned earlier in Chapter 2, in our approach we do not allow over-the-cell routing. When a large number of metal layers is available, over-the-cell routing can be a solution to resolve routing problems, especially with regard to congestion. However, high-quality routing along module boundaries is always needed for any of the following reasons.

Routing in higher level metal layers requires the use of additional vias which are expensive in terms of parasitics and yield. If intellectual property blocks are used in a module, parasitic interaction due to crossing wires with those blocks is best avoided because it is unknown how the circuit performance will be inuenced.

Furthermore, we assume that an interconnecting network of wires of which no wire runs over a module, and with minimal total length, is optimal.1 Moreover, we adopt a rectilinear wiring model in which wire segments are only allowed to go in either horizontal or vertical direction. Such a network is called an obstacle-avoiding rectilinear Steiner minimal tree [96, 19]. Finding such a tree is an NP-hard problem [97, 19]. A standard method to represent Steiner trees uses graphs, where the nodes represent junctions, bends, or crossings and the edges represent possible wiring segments. The graph approach can be applied without loss of generality thanks to Hanans theorem stating that a Steiner minimal tree exists in a Hanan grid [98]. The Hanan grid is a rectilinear grid in which the grid lines are induced by the pins, and their crossings form nodes in the graph. Naturally the line segments are the edges in the graph. Formally stated the global routing problem is as follows. Problem: Steiner minimal tree (SMT) in a graph (GSMT) Instance: Solutions: Minimize: and a set of pins that form a net a weight function. All trees that connect the elements of in . , where .

A graph

, and

with

The nodes in the graph are represented by the set . The edges in the graph, represented by set , are undirected. The pins are also called demand nodes in this context, whereas the other nodes in are called Steiner nodes which are candidate nodes for the trees in . There are two special cases which have polynomial-time complexity. The rst case is . This case is also known as the single-pair shortest-path problem. Dijktras algorithm [99] can be used to solve it in time , for instance with a Fibonacci heap data structure. Here and are the set of edges and nodes, respectively, which are contained in the equivalent graph enclosing all modules in the rectangular region dened by the modules attached to the 2-pin net at hand. A more efcient target-directed path search algorithm called can be used to nd an optimal path between two pins in time proportional to the number of edges (and nodes) on the path [100, 101]. The other special case , where all the nodes in the graph need to be connected in a minimal occurs when
1 Symmetry

considerations can be taken into account by employing additional algorithmic steps.

7.2 Classication of Routing Approaches

129

sense. This is called the minimum spanning tree (MST) problem, and it can be solved in with Fibonacci heaps using Prims algorithm [102]. In appearance, Prims algorithm is very similar to Dijkstras algorithm. The fundamental difference is that the former stores the edge weights associated with candidate extension nodes on the heap, while the latter stores the path lengths associated with candidate shortest-path extension nodes on the heap. Unfortunately, all other cases are known to be NP-hard. This even holds for many conceptually simplied versions of the Steiner minimal tree problem. For instance, the Steiner minimal tree in planar rectilinear graphs is NP-hard [19].

7.2 Classication of Routing Approaches


A routing approach consists of a routing model and a routing algorithm. The model dictates how much information is stored of the routing space, essentially dening the solution space. The algorithm denes how we search for a solution in this space. Clearly, routing model and routing algorithm are strongly correlated. Which approach and choice is best, depends on the desired routing quality in connection with a denition of an optimal routing. In the literature, a large variety of routing models and algorithms can be found. Typically, no explicit distinction is made between the two; a certain routing model is used implicitly by a given routing algorithm. From this literature we can, roughly, classify routing approaches into:

single-step versus two-step approach This is a classication based on hiercharchy. The two-step approach essentially adopts a hierarchical divide-and-conquer strategy2 in order to (conceptually) simplify the problem and eventually nd good solutions more efciently3 . regular-grid versus irregular-grid approach This is a classication based on the efciency and effectivity of information representation; higher efciency means less redundant information in the representation, and higher effectivity means that (more) higher quality solutions can be found using the representation. An advantage of the regular-grid approach is its conceptual and practical simplicity. However, it is mostly very inefcient in terms of space and time requirements.

In our opinion it is advantageous to separate the routing model and the routing algorithm explicitly. Consequently, both items can be constructed, analyzed and improved separately. Also, we gain more insight into the properties of both items. In the context of mixed-signal layout generation we choose for a two-step routing approach based on an irregular grid model. The reasons for this choice are as follows (rst two reasons why a 2-step approach is preferred, followed by two reasons why an irregular-grid model is preferred):
2 The difference with the classical divide-and-conquer algorithm is that we do not impose a uniform conquer strategy. 3 In fact, the classication could be generalized into multi-step versus single-step but even in those cases the bisection is dominant.

130

Routing

A two-step approach consisting of a global routing step followed by a detailed routing step enables controllable routing quality renement which is advantageous in an iterative optimization framework such as simulated annealing. A two-step approach mitigates the problems in connection with properly handling all routing-related issues, which typically have complex interdependencies, at the same time. By introducing hierarchy, the problems can be made easier manageable. A routing strategy requires a routing model which has as little redundant information as possible and, at the same time, guarantees the existence of an optimal solution. An overall efcient integrated placement and routing approach requires a low-complexity algorithm to compute necessary routing information from a given placement.

Furthermore, a very important additional requirement is an efcient update mechanism of (small) dynamic changes in the graph. For instance, after a small change in a placement due to a perturbation operation, it would be a waste of computation time to re-compute the whole routing graph again from scratch. Clearly, a signicant gain is possible if the routing graph can be efciently updated in an incremental sense. Below, the classications based on routing hierarchy and routing model are discussed more in depth. Note that a choice with regard to the routing hierarchy is essentially independent of a choice with respect to the routing model.

7.2.1 Routing Hierarchy


At a high level, essentially two different approaches exist to accomplish routing of a circuit. These approaches are extremes in a hierarchical sense. In practice, a suitable combination of these approaches is well imaginable to nd a better routing. Single-step Routing A non-hierarchical single-step routing approach performs routing including exact determination of the wire segments in the plane. We dene single-step routing to be synonymous with area routing. Although area routing is commonly associated with maze routing, we make a strict distinction between the two in this thesis.4 Area routing is dened as detailed routing on a global basis. Hence, the underlying routing model is not specied. A specic implementation of an area router could therefore use a regular-grid routing model or a tile-based routing model. The latter is a generalization of the former model in the sense that each tile can be associated with a set of grid elements. Furthermore, maze routing is dened as a routing approach based on a regular-grid model. Essentially, the actual routing algorithm is not specied. Usually, a shortest-path(s)-like algorithm is employed in connection with maze routing. Maze routing can be employed on a local as well as on a global level. Due to the fact that a single-phase routing approach lacks a global view of the routing problem, problems can easily arise. We name a few problems which are especially eminent in our mixed-signal layout generation framework.
4 Lengauer [14] already noted that the distinction between area routing and some detailed routing approaches can become very fuzzy.

7.2 Classication of Routing Approaches

131

It is very difcult to predict whether or not a net is routable with specied quality margins, due to previously routed nets which form obstacles for succeeding nets. Consequently, it is almost impossible to solve the routing problem adequately, i.e. to nd near-optimal solutions for all nets with respect to the input specications, for most but the simplest problem instances. It is difcult to evenly spread the wires over the chip area while at the same time targeting good solutions for all nets. Computational complexity increases very rapidly due to forced ripup-and-reroute strategies in order to achieve compliance with specications. Furthermore, the computational complexity is very hard to analyze and bound. From an algorithmic point of view, it is difcult to comprehend the impact on the output of a routing algorithm as a function of tunable algorithmic parameters. As a consequence, possible improvements are based mainly on a trial-and-error basis, which can overshadow possible fundamental improvements based on scientic insight.

As a side note it is questionnable whether xating on details, while the global line has not been formulated yet, is a fundamentally sound approach. We argued that for large circuits the area routing approach is infeasible because of the complexities that are involved in managing all details on a global level. However, regional area routing approaches can give excellent results when the size of the area that is routed in a single phase is bounded in size [103]. A practical implication of this fact, is the importance of dening manageable regions for area routing. Essentially, this is accomplished by means of hierarchy. Two-step Routing An approach that uses hierarchy to split the overall problem into conceptually easier to grasp sub-problems is the well-known two-step routing approach consisting of global routing followed by a detailed routing step. Important advantages of two-step routing are as follows.

The optimization of global routes is conceptually simpler without considering detailed routing aspects and the optimization of detailed routes is conceptually simpler without considering global aspects. Algorithm design and analysis is simpler for spatially conned detailed routing problems. Moreover, the quality of routing results is easier assessed as a function of algorithmic features.

In the context of an iterative optimization framework, the possibility to trade off runtime against solution quality is of paramount importance; typically, detailed routing becomes increasingly more important when the global routes are approaching the status of being good enough. An example scenario is appropriate to illustrate the idea of hierarchical routing. Imagine that the total layout area is divided into segments by overlaying a grid on the plane. During global routing it is determined through which grid cells of the layout area, a global route will go. Every grid cell has an associated capacity which limits the maximum number of

132

Routing

global routes that can pass through that cell5 . The actual number of routes through a cell is called the demand. When the demand is larger than the capacity, we speak of routing congestion. Typically, more crucial nets are routed rst, followed by less important nets, in order to satisfy imposed timing or wire-length constraints. After all global routing congestion has been resolved the detailed routing step is started. During detailed routing, the actual geometric location of each wire is computed, guided by the global routing information. Generally, channel and switchbox generation is needed before the detailed routing step. It can occur that detailed routing is impossible with the given placement and global routing information. In such cases some of the global routes are removed and, with adjusted constraints, re-routed. This classical approach is called ripup-and-reroute. We stress here that the aforementioned hierarchical approach serves only to illustrate the idea and is not the approach we advocate in this thesis. From the previous discussion it is clear that the best choice for mixed-signal layout generation is a two-step approach. Note that the routing model that is used in both steps need not be equal.

7.2.2 Routing Model


Depending on the desired routing accuracy, affordable computation time, affordable implementation effort, and the denition of the optimal solution, a certain routing approach can be employed. For the implementation of any routing approach, a representation of the routing information, i.e. the routing model, is required. From an information representation point of view these models can be classied into irregular-grid-based and regular-grid-based. Hereafter, in-depth information on these routing models is given, illustrated by the use within a typical routing environment. Regular-Grid Routing Model Plane-based routing uses the 2-dimensional plane as its routing space. Usually, only integer coordinates are allowed, avoiding numerical problems, without adversely affecting the routing performance or limiting algorithmic exibility. As a consequence, the relevant points in such a plane lie on a regular grid. This grid can be used directly to perform routing, by searching for a sequence of grid points which interconnect the pins of a net. The grid can also be used to nd congestion-avoiding routing solutions by associating each grid-line segment with a certain capacity which depends on how many routes we allow to cross that segment. The size of a single grid tile can be adapted according to, for instance, some hierarchical scheme. Usually, the congestion-minimization problem is cast into some kind of ow problem which is modeled by a graph [104]. A well-known routing strategy based on a grid, is due to Lee [105]. The associated routing algorithm, is also called a maze router. The main drawback of maze routing is its high memory usage and large computational complexity. In terms of an grid, the memory usage is and the time complexity is at best , when a linear-time algorithm is used to compute single-source shortest paths [106]. An advantage of maze routing
5 Equivalently, we can treat each cell as a node, with adjacent cells inducing corresponding edges in a (grid) graph, and apply the same ideas.

7.3 Previous Work

133

is that it is precise in terms of wire locations. The latter renders maze routing also a useful candidate for detailed routing. The plane space can also be used directly to nd solutions to a routing problem, without considering each grid point or grid tile on a (partial) path separately. Finding an interconnecting network for multi-pin nets, can be performed by solving the rectilinear Steiner minimal tree (RSMT) problem. Each (grid) point in the plane is then a candidate Steiner node. Hanans theorem tells us that it sufces to consider only the grid points which overlap with the Hanan grid [98]. Since the RSMT problem is NP-hard [97], we can also use an approximation algorithm to approximate an RSMT in , where is the number of pins to be connected [107]. Note that the complexity is independent of the grid size. Because the RSMT only considers pin locations and Hanan grid points (which generally have little in common with modules in the plane), it ignores obstacles (represented by modules) in the plane. Therefore, it violates our non-over-the-cell routing requirement. However, as we shall see in Section 7.7 RSMT routing solutions are typically less than 6% away from obstacle-avoiding routing solutions for a wide range of routing instances. Irregular-Grid Routing Model Motivated by the massive memory requirements of grid-based routing and the inability of plane-based routing to take obstacles into account (while guaranteeing a (near) optimal solution if it exists), researchers have thought of ways to minimize memory usage without precluding optimal or near-optimal routing solutions. Routing based on graphs, has been shown to be very efcient in terms of computational effort. Furthermore, general graph theory is a broad and active eld of research, with many useful techniques and algorithms that can be exploited. The graph-based routing approach relies on the proper denition of nodes in the graph which correspond to locations in the plane. The edges in the graph are used for routing. Each such edge represents a routing path segment which can be used for interconnecting a set of pins. Furthermore, weights can be associated with each edge to denote its importance, Manhattan length, capacity, or a combination thereof. In the extreme case where each node in the graph is a grid point and vice versa, the graph-based approach degenerates into a grid-based approach. The efciency of the graph representation is directly affected by the efciency of the grid representation since, typically, the computational complexity of a graphbased approach is expressed in number of nodes and edges in the graph. Intuitively it is clear that an irregular grid is more efcient than a regular grid, since the former uses a denser grid at locations in the plane where it is needed and a sparser grid at locations where it is allowed. A practical bottleneck of this non-uniform manner of information representation is to nd out where and how much this grid is to be sparsied. Fortunately, efcient methods exist to perform this task, one of which is proposed in this thesis in Section 7.5.2. As a result, an irregular-grid routing approach is preferred over a regular-grid routing approach.

7.3 Previous Work


The eld of routing research is very broad. In recent years, high quality routing has attracted increasingly more attention due to the importance of routing in contemporary designs. In

134

Routing

this section, we will provide a global overview of previous works in the routing eld that are considered of interest to mixed-signal designs. Especially those methodologies that apply to analog circuit routing are eligible candidates for application to mixed-signal layout generation. However, we do not conne ourselves to solely analog routing methodologies, a priori, since there are no fundamental reasons why methodologies used in the digital domain cannot be useful in the mixed-signal domain. Typical approaches in analog routing are based on area routing. That is, the exact wiring pattern of each net is determined, taking into account previously routed nets and analog constraints such as parasitic resistances, capacitances, and crosstalk. This approach is advocated in the works of Cohn et al. [6], Lampaert [8], and Malavasi and Sangiovanni-Vincentelli [108]. A promising approach in which the current through a wire is also taken into account to size the widths of interconnecting wires is reported by Adler and Barke [109]. Other area routing approaches, from a digital point of view, are described by Tseng [103]. The works that advocate an essentially two-step routing approach, with a rened detailed routing step that incorporates or can incorporate timing, crosstalk, and parasitic , are [41, 103, 110]. Generally, when a strategy is rened so as to take into account additional constraints related to performance degradation due to routing, it is denoted by performancedriven routing [111, 112, 3].

7.4 Computational Complexity


We observe that most routing approaches assume that a placement of modules is given. Under these circumstances, typically, adjusting the placement to improve routing is not considered. The basic strategy is to nd a placement based on some heuristic, and space the modules in such a way that a suitable routing is expected to be found. The spacing can be based on given statistics, experience in the eld, or a knowledge database. As the spacing usually contains some margin, an additional compaction step is usually employed to minimize area. Since compaction in itself is a very hard problem, it is better to avoid it. And it can be avoided if a better estimation of routing resources can be obtained. Our approach does not rely on a xed placement for which a proper routing is to be found. Instead, we attempt to nd a placement in which the compaction step can be skipped because the routing ts well within the given placement. This is only possible using iteration. Due to the fact that the number of iterations in the employed simulated annealing environment can be excessively large, the computational complexity of a single routing iteration should be as low as possible without giving in too much on quality. Therefore, efcient routing heuristics with low computational complexity are needed. Of course, the quality of such a heuristic should be adequate. The computational complexity of a routing methodology depends on the routing model and the routing algorithm. Both should be efcient in order to arrive at an overall efcient routing strategy. For the sake of minimizing computational complexity, a two-step, irregulargrid approach is more suitable in an iterative optimization framework. From the previous discussion it is clear that the routing model is a key element in efcient routing. We propose a global routing model which can efciently capture all the information of global routes. A detailed routing model can then be used to exploit the gathered information to compute the exact geometries of the wires. We do not focus on detailed routing in this

7.5 Global Routing Model

135

thesis, but merely mention its importance to arrive at the nal layout.

7.5 Global Routing Model


The proposed global routing model is similar to the routing model used by Cohoon and Richards [113], which is called an escape graph. Conceptually, the construction of an escape graph is easy. Each module boundary segment is extended maximally either horizontally or vertically until it hits another module or the boundary of the chip area. For the example placement shown in Figure 7.1(a), the corresponding escape graph is shown in Figure 7.1(b). We will use escape graph and global routing graph interchangeably, hereafter. Note that the

8 6 7 9

(a)

(b)

(c)

Figure 7.1: (a) A packing of ten blocks and (b) the derived global routing graph . After inserting pins of a net we get (c) the extended global routing graph . global routing graph, denoted by , must be extended for each net to include the pins of that net and their escape line segments. An extended global routing graph, shown . in Figure 7.1(c), is denoted by . Of course, and Computing is not trivial. Cohoon and Richards proposed a line-sweep method which yields an algorithm. It is not clear from their approach how pins and their associated escape segments can be dynamically inserted and deleted from the escape graph. Since construction of a static global routing graph is of limited use in the present context, we propose a new dynamic method based on corner stitching and a hash table.

7.5.1 Model Efciency


The efciency of a routing model is, among others, measured by the amount of information it carries. In the case of a graph model, which is by far most common, the number of nodes and the number of edges account for the efciency. In terms of storage requirements, the amount of required space is as can be concluded from the previous discussion, and the complexity of constructing is . It is possible to reduce the latter by using implicit connection graphs as proposed by Zheng et al. [114], but since the routing process generally covers the complete packing space, this approach does not seem to provide

136

Routing

any substantial gain for our purpose. Another important issue is the ability of the global routing graph to represent effective solutions, i.e. solutions which are very close or equal to an optimal solution. Cohoon and Richards already showed that an optimal shortest path between any two nodes in the escape graph always exists [113]. More recently, Ganley proved the following interesting theorem [96]. Theorem 8 The extended global routing graph

contains an optimal rectilinear Steiner minimal tree.

This result is of interest when we want to nd an optimal solution for a nontrivial multipin net where the number of pins is larger than two.

7.5.2 Global Routing Graph Computation


The recipe to construct

from a given placement is given in Figure 7.2. The computa-

1. Convert the sequence pair to a packing using constraint graphs. 2. Convert the packing to an equivalent corner-stitching data structure [52], incorporating all escape line segments. 3. Segment the perimeter of the corner-stitched modules into horizontal and vertical line segments and sort them with increasing and coordinates, with as the primary sort key and as the secondary sort key. 4. Sequentially insert the nodes and edges implied by the line segments into a hash table that holds the global routing graph. Nodes are inserted explicitly (based on their coordinates), and since each node can have at most 4 incident edges, all incident edges are kept explicitly within a node.

Figure 7.2: Construction of the global routing graph. tional complexity of step 1 is determined by the packing complexity, which is at best for a from scratch computation (see Chapter 6). For step 2 we have to insert each module sequentially into a corner-stitching data structure. All absolute module positions are known, so we can insert the modules in using the depth-rst search order of the modules in, say, the horizontal constraint graph. When all modules are put into an equivalent corner-stitching data structure, the exact locations of all line segments in the placement are known. Furthermore, due the maximally horizontal empty tile property of corner stitching, all horizontal escape line segments are generated automatically. Step 3 comprises the enumeration and segmentation of each empty and non-empty tile in the corner-stitching data structure. The required complexity is , where is the sum of empty and non-empty tiles. Since there are non-empty tiles, . In the worst case , but typically . The underlying reason is that in a typical placement, each module will be shielded by a number of surrounding modules from the rest. As a consequence, a typical escape line affects only a bounded number of surrounding tiles. Hence, the independence . Therefore, sorting of the line segments can typically be performed in of for any .6 Finally, in step 4, all line segments can be traversed sequentially, each pair
6 Theoretically, this can be reduced to with a sorting algorithm such as bucket sort [34], under the assumption that the distribution of the line segments has a (uniformly) random behavior. This results in a linear overall complexity of the algorithm.

7.5 Global Routing Model

137

generating at most three edges in . The latter property can be easily seen from the example segmentation of the right-side segment of module 5 and the left-side segment of module 6 in Figure 7.1(a), drawn with a dotted line. The resulting edges are denoted in Figure 7.1(b) by , and . Insertion in the hash table requires per node, with at most four edges being implicitly represented within each node. Summarizing, the whole procedure can be performed in .

7.5.3 Supporting Dynamic Changes


Changes are applied continuously in the global routing graph; either due to insertion of pins of a specic net or due to a change in the placement. Consequently, updating the global routing graph after such a change occurs, should be done in an efcient manner. Dynamic Net Change One of the difculties in dynamically maintaining a global routing graph is due to the insertion of new pins. The deletion of pins and their escape segments is relatively easy. As we know where an existing pin is located, we can iteratively delete an escape line segment and its incident nodes until we hit an obstacle. Figure 7.3 shows graphically what happens during deletion of a pin. Suppose pin was inserted and caused the creation of the escape line

(a) (b) (c)

Figure 7.3: (a) A part of the global routing graph with a pin and its escape line segments. (b) The intermediate result after deleting escape segments and , and pin . (c) The global routing graph after deleting pin and its escape line segments. segments (edges): , and (Figure 7.3(a)). When we want to delete pin , rst its incident escape edge has to be deleted, i.e. edge . Then we can delete the node associated with pin . After deletion of nodes, the two perpendicular edges ( and ), if they exist, must be joined again into a single edge, unless the node at position must remain there because it is a module corner point or induced by some module corner. If, after deleting , we have not hit an obstacle yet, we delete another edge and its (lower) incident node (Figure 7.3(b)). Finally, after deleting edge and its (lower) incident node, we notice that the other incident node is connected to a module. Thus, an obstacle has been found and the nal node is deleted, after which the split edge ( and ) is restored again to its original condition (Figure 7.3(c)).

138

Routing

Insertion of a pin and its escape line segments into requires much more thought since we do not know where an escape segment might cross another line (edge). At such a crossing, the edge needs to be split and a node has to be inserted. Fortunately, the corner-stitching data structure can mitigate the problem because it can nd a closest point in with a hint7 , and going to a neighboring edge also takes . Thus, constructing after inserting a pin requires complexity essentially proportional to the number of escape line segments induced by the pin. And, following the same line of reasoning as above, this number is typically a constant. The total computational complexity required for performing all insertion and deletion steps is proportional to the number of generated escape line segments during insertion of the pin. From experiments with randomly generated placements, we found out that the number of nodes is usually smaller than 15 times the total number of modules in the placement, and that the number of edges is usually smaller than 30 times the number of modules. Since randomly generated placements are normally quite sparse (containing a lot of slack space), global routing graphs associated with optimized placements are much smaller. Figure 7.4 gives an impression of the number of nodes and edges in global routing graphs derived from randomly generated placements for a wide range of instance sizes. It is clear from Figure 7.4(a) that the
8 x 10
4

Growth of global routing graph size as a function of number of modules


number of nodes number of edges

Ratio of number of edges and number of nodes as a function of number of modules


1.9

1.85

1.8

# edges / # nodes

1.75

1.7

1.65

1.6

1.55

500

1000

1500

2000

2500

3000

1.5

500

1000

1500

2000

2500

3000

(a)

(b)

Figure 7.4: (a) The number of edges and nodes in the global routing graph for a wide range of randomly generated placements is an approximately linear function of the placement instance size . (b) The ratio of number of edges and number of nodes as a function of the placement instance size converges to approximately 2. number of nodes and edges in increases linearly with the placement instance size . Furthermore, Figure 7.4(b) shows that the average ratio of number of edges over number of nodes becomes approximately 2 for increasing . This observation is important enough to put in a Claim. Claim 1 The size of the global routing graph modules in a random packing.
7 In

depends, on average, linearly on the number of

this case a good hint would be the pointer to the last found module.

7.6 Global Routing Algorithms

139

A direct consequence is stated by the following Corollary. Corollary 3 The number of nodes and edges in a connected subgraph of is a linear function of the number of modules in the associated conned routing region. Based on these results, we may conclude that pin insertion and pin deletion (including escape line segment processing) takes essentially constant time, on average. Dynamic Placement Change A concern in a dynamic environment where placements change often, is the complexity of updating after a placement change. Fortunately, the corner-stitching data structure allows for dynamically inserting and deleting modules in an efcient way. Clearly can be updated directly when a module is inserted into or deleted from the CS data structure. This requires some localized operations on , which takes essentially with the aid of corner-stitching operations. A way to minimize computational complexity is to use a doublewave technique. The rst wave clears the way for the second wave by deleting all affected modules. As soon as enough space has been cleared by the rst wave, the second wave rebuilds the placement by inserting modules. This idea is shown in Figure 7.5.
unaffected region unaffected region unaffected region rebuilt region

affected region

cleared region

cleared region

(a)

(b)

(c)

Figure 7.5: (a) After a placement change occurs, this can lead to a set of affected modules located in the affected region. (b) This region is cleared by a clearing wave , directly followed (c) by a rebuilding wave that inserts affected modules at their new locations, until the affected region has been fully rebuilt.

7.6 Global Routing Algorithms


Now that an accurate global routing model has been dened, we have to dene how to search for a tree which connects a set of given pins in the global routing graph . Ideally, this tree should be minimal in cost, which is dened to be a Steiner minimal tree in the extended graph . Unfortunately, nding such a tree is NP-hard [97], even in a planar graph such as . Since the size of a routing problem instance can be large, we have to resort to efcient heuristics which can nd near-optimal solutions in as little time as possible.

140

Routing

In the following subsections we give an overview of several existing heuristics, where we explicitly distinguish (exact) algorithms for 2-pin nets in Subsection 7.6.1 and (heuristic) algorithms for multi-pin nets, containing at least 3 pins, in Subsection 7.6.2 through Subsection 7.6.5. We also propose a few modied versions of these heuristics. The performance of these heuristics is compared with optimal solution values on a broad range of synthesized problem instances. The optimal solutions are obtained with the help of several state-of-the-art programs that incorporate advanced techniques [115, 13]. The synthesized problem instances are directly derived from actual sequence-pair-based placement results of randomly generated block sizes and nets. We choose the best-performing heuristic as the basis of a cost-constrained pin-to-pin global router. Furthermore, the heuristic is modied to incorporate incremental routing of partially changed routing segments which can be, for instance, a consequence of placement-induced changes in the routing graph.

7.6.1 Two-pin Routing Algorithms


As pointed out before, the problem of nding an optimal (shortest) pin-to-pin route is solvable in polynomial time. Several exact algorithms exist that can perform this task efciently within the proposed routing graph model. Probably the most widely known algorithm is Dijkstras single-source-shortest-paths (SSSP) algorithm [99], which has complexity on planar graphs. A more efcient algorithm, and provably optimal in a wide sense, is the algorithm. This algorithm was originally introduced by Nilsson [100] and further elaborated by Pearl [101]. The essential differences between the SSSP and algorithms are as follows.

The SSSP algorithm nds shortest paths between a single source pin and all (reachable) other nodes in the routing graph, whereas the algorithm nds a shortest path between a single source pin and a single target pin of a 2-pin net, thereby avoiding the exploration of a huge amount of irrelevant nodes. The SSSP algorithm does not use a priori information on the location of the target pin, whereas the algorithm owes its efciency to its target awareness.

We discuss both aforementioned algorithms briey because they are used quite extensively in our framework. For instance, in the multi-pin net routing algorithms, two-pin routing problems are encountered and solved iteratively. Details will be given shortly. Single-Source-Shortest-Paths (SSSP) Algorithm A general-purpose version of Dijkstras SSSP algorithm explores each node in the graph that is input to the algorithm. In case of a pin-to-pin path search problem it makes sense to stop when a shortest path connecting the two pins has been found. Therefore, we propose a modied version of Dijkstras algorithm which is shown in Figure 7.6. The difference is essentially due to the connement of routing space. Whereas Dijkstras original algorithm processes all nodes and edges in the graph, our modied algorithm processes only the nodes and edges within the a priori dened relevant region. Hence, the computation time is signicantly reduced. In the remainder of this thesis we will refer to the modied SSSP algorithm, simply as the SSSP algorithm, unless explicitly noted otherwise. First, we describe the basic operation of the algorithm on a more intuitive level. Thereafter we give a more formal description. Assume all node distances are initially set to a very large value. We start exploring

7.6 Global Routing Algorithms

141

the graph from the predened source node. Each incident (weighted) edge is explored and the connecting nodes are conditionally put in a priority queue, keyed by their distance from the source node. We continue this process with the cheapest (shortest-distance) node extracted from the priority queue. Note that the extracted node might be a node that was discovered long before the expansion of the last node. One can see that the exploration of the nodes and edges occurs in a manner similar to an outward wave propagation induced by a falling drop of water at the source node location. The algorithm stops when the wave front hits the target node and a shortest path between source and target node has been established.
Input: routing graph and two pins Output: shortest paths from source pin to all nodes in
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

to be routed

initialization

while do extract min if then break foreach with do if then if then


od od while

do extract min if then break foreach with do if then

od od

if then

Figure 7.6: Dijkstras single-source-shortest-paths algorithm applied to a 2-pin problem instance. The general algorithm is modied to stop as soon as an optimal path is found. Note that this does not necessarily happen upon the rst encounter of the target pin. We proceed with a more formal discussion of the (modied) single-source-shortest-paths algorithm given in Figure 7.6. All nodes in the graph are initially set to a very large distance value, except for the source pin which has its distance value set to zero at line 2. On line 3 the source pin is inserted into the priority queue . Then in the loop from line 4 to line 14 the following actions are performed iteratively until either becomes empty or the target pin is encountered. The cheapest node in the priority queue is extracted for expansion using extract min (). Expanding a node means that all its adjacent nodes are explored. If the distance value of an adjacent node is larger than the distance value of the expanding node plus the weight of edge , then node is relaxed on line 9. Relaxing a node means that its distance eld is decreased to the lowest currently known distance value to that node.

142

Routing

In case a relaxation step is performed, the relaxed node will have its parent pointer set to the expanding node . These parent pointers are useful for backtracking the shortest path nodes and edges when the algorithm nishes. Moreover, if a relaxed node is not in the queue yet, then it is inserted into on line 11. The loop from line 15 to line 25 makes sure that the found shortest path between and is indeed shortest. In order to guarantee this, the algorithm extracts the nodes in the queue which could possibly lead to a shorter path. When a node is extracted from with distance not smaller than the current distance value of , it is clear that no improvement is possible and the algorithm can stop. It can be veried that when the weight function is positive, which is the case when the weight of an edge is equal to its Euclidian length, the algorithm guarantees nd an optimal path. Furthermore, to the worst-case computational complexity is [34].

Algorithm
The algorithm is essentially a generalized best-rst search strategy. These type of algorithms are also called labeling algorithms because during the algorithm execution, status labels are attached to a node. If a node is candidate for expansion, then we label it with OPEN. If a node has been expanded it is labeled with CLOSED. However, labeling is not strictly required in an implementation, because making a node part of a set can be effectively equivalent to labeling. But for the sake of clarity and ease of discussion, labels may be very useful. We will use labeling wherever appropriate. In the rest of this thesis we assume the global routing graph is connected. Before discussing we will state some denitions. The cheapest cost of a path between two nodes and is dened by . In general we will use to indicate a function which yields an optimal value, and to indicate an estimating function. The algorithmic steps of are shown in Figure 7.7. The algorithm operates as
Input: routing graph and two pins Output: a shortest path connecting source and target node
1 2 while do 3 extract min 4 CLOSED 5 if then break 6 foreach 7 if OPEN 8 then 9 10 elsif OPEN 11 then 12 13 elsif CLOSED 14 then 15 16 17 od 18 od

to be routed

do

CLOSED

Figure 7.7: The basic steps of the

algorithm.

7.6 Global Routing Algorithms

143

follows. Initially, all nodes are labeled INITIAL and the backtracking parent elds of all nodes are set to . The algorithm starts by labeling the source node OPEN and puts it in queue . By denition, all elements in are labeled OPEN. Then the algorithm proceeds by selecting the best node, i.e. the node with smallest distance value , from , in the sense dened by

where

(7.1)

is the sum of edge costs along the current path of pointers from to the source
node ;

is the estimate of the cheapest cost of paths going from node to the target node
.
If the selected node is the target node then we have found an optimal path and the algo then all neighboring nodes of are evaluated with respect rithm will terminate. If to their current shortest path distance to and their estimated distance to , and, if appropriate, backtracking information is updated and nodes are (re-)inserted into the queue (or, equivalently, labeled with OPEN). This procedure is repeated until the target node is found, which is guaranteed. for all then (by denition) is admissible, i.e. it is If we choose will yield an optimal solution [100, 101]. However, if we want to measure guaranteed that s effectiveness by its ability to exclude as many nodes as possible from expansion, then admissibility alone is not sufcient. As proven in [101, 116], never reopens a CLOSED node under the following consistency condition:

(7.2)

Note that consistency implies admissibility. If (7.2) holds, we have the following useful property [100]: CLOSED (7.3) which means that the backtracking path from every CLOSED node to the source node is a least cost path. For our rectangle-packing-derived global routing graph, we use

(7.4)

where denotes the Manhattan distance between the target node and . Note that the distance measure need not necessarily be Euclidian or Manhattan in general. Actually, researchers tend to use a different (nonlinear) metric based on congestion and specic circuit constraints [108, 103]. However, usually this violates the consistency property, thus enlarging the practical average complexity of . It can be veried (using case distinctions) that (7.4) preserves consistency. The average computational complexity of the algorithm is substantially better than the SSSP algorithm. Typically, the complexity is proportional to the total number of nodes on the shortest path between the routed pins. In case we have modules in the relevant

144

Routing

routing region, this results in average computational complexity using Claim 1 and Corollary 3.8 Although is better than SSSP in terms of computational complexity, it is not always suitable for nding a 2-pin shortest path. For instance, in cases where the location of the target node is unknown, cannot be used and we have to resort to the SSSP algorithm.

7.6.2 Minimal Bounding Box (MBB) Routing


For completeness and comparison purposes a commonly used global routing algorithm is discussed here. The algorithm is very simply as it estimates the global routing length of a net simply by taking the half perimeter of the smallest rectangle that encloses all pins of the net. This method is also called minimal-bounding-box (MBB) routing estimation. Clearly, this estimated value is an absolute lower bound on the length of the interconnect. It is also obvious that the actual error with respect to the optimal value can be very large. Two illustrative examples are shown in Figure 7.8. In cases where over-the-cell-routing is not allowed it can be seen that MBB routing estimation yields poor results in general. But even if over-the-cell-routing is allowed MBB routing performs poorly. optimal-length path

half-perimeter-length path

optimal-length path
(a)

half-perimeter-length path
(b)

Figure 7.8: Two illustrative examples of the coarseness of minimal-bounding-box routing estimation; (a) shows a 2-pin net, (b) shows a 3-pin net. The total wire length is calculated by summing up the half-perimeter lengths for all nets. The computational complexity of this method is given by

(7.5)

where is the number of pins in net . Since each pin is in exactly one net, the total complexity is . This computational complexity is very low. However, a major drawback of this method is poor accuracy of the estimation. Therefore, it is not appropriate for use in our optimization framework in which speed and accuracy are of utmost importance.
8 We assume here that the relevant routing region has a squarish size. If this is not the case, the average computational complexity is expected to be approximately .

7.6 Global Routing Algorithms

145

7.6.3 Minimum Spanning Tree (MST) Routing


An improvement in routing quality over the minimal-bounding-box estimation method is the computation of a minimal spanning tree (MST) in the plane of all pins in a given net. Note that this problem is fundamentally different from the Steiner minimal tree problem because no Steiner points are used. Similar to the Steiner minimal tree (SMT) problem in the plane, the minimal spanning tree problem ignores obstacles, which are the modules in the placement problem. A fundamental difference between the SMT problem in the plane and the MST problem is that the latter can be solved in polynomial time while the former is NP-hard. We do give in on solution quality because the MST algorithm disregards possible improvements in tree length by using maximally overlapping subtrees. Another way to show the supercial analogy is to observe that the minimum spanning tree (MST) of the set of pins in a net in a graph is called a Steiner minimal tree (in a graph). And as discussed before, nding such a tree is an NP-hard problem. Fortunately, the latter problem can be transformed in polynomial time to an easier to solve minimum spanning tree problem that is independent of the original graph size. Of course, the solution of the latter problem is not an optimum, but normally not too bad. The approach is as follows. Regard the , i.e. a graph in which each node is set of pins as the nodes in a complete graph connected to every other node, with each edge in the graph equal to the Manhatten distance between the nodes it connects. Formally this is written as

where is the -coordinate of node and is the -coordinate of node . As we now have a graph in which the number of pins to be connected is equal to the number of nodes in the graph, Prims minimum spanning tree algorithm can nd a minimal tree connecting all which is also written as because is complete. In nodes in terms of all nets to be routed the total computational complexity is

(7.6)

Note that the above procedure does not directly yield a solution in the form of a subgraph of the original graph. Additional steps must be taken for this. Very recently, an efcient line sweep algorithm was introduced which computes a rectilinear MST of points in the plane in [117], without the use of Delaunay triangulation. The latter is a well-known method for Euclidian MST computation in , but it not well dened for the Manhattan distance measure. Unlike the MBB estimation method, the MST error is bounded. Hwang [118] proved that the ratio of rectilinear MST cost over rectilinear SMT cost is never more than 3/2. However, experimental results reported in [119] indicate that the difference between rectilinear MST cost and a solution produced by a good rectilinear Steiner minimal tree heuristic, is more than 10% on average. This implies that the difference between the MST cost and the cost of an optimal routing solution in a graph is signicantly more than 10%.

146

Routing

7.6.4 Path-Based Routing


The paths-based routing heuristics grow a tree from a given node in the routing graph by sequentially adding a path to a pin until the tree spans all pins. The expansion is typically based on iterative addition of a shortest path between any node already in the tree and any pin not yet in the tree. There are many possibilities with respect to searching for a specic pin and connecting that pin to the tree built so far. Hence, the numerous amount of paths-based heuristics that exist in current literature. We will only cover the most interesting ones from our point of view. Although some differences between the given paths-based heuristics are rather subtle, we demonstrate in Section 7.7 that the routing results can differ signicantly. We start with the original shortest-paths heuristic (SPH) introduced by Takahashi and Matsuyama [120]. A greedy variant is proposed next, which is called the shortest-paths-based heuristic I (SPBH I). Another variant which uses a constant-time distance estimation technique is denoted by shortest-paths-based heuristic II (SPBH II). Finally, we discuss a general method to improve the solution quality of any heuristic by repetition with different starting points. Of course, this occurs at the cost of increased computation time. Shortest-Paths Heuristic (SPH) Similar to the way in which Prims minimum spanning tree algorithm constructively adds edges to a shortest spanning tree of all nodes in a graph, the shortest-paths heuristic (SPH) of [120] explores shortest path segments leading to pins. The shortest-paths tree is constructed iteratively by adding a shortest path from the current tree to a closest unconnected pin. The current tree can also be viewed as a virtual node to which the closest pin is to be connected. When all pins have been connected this way, the heuristic nishes and returns the tree it found. Let denote a path whose cost is minimal among all shortest paths from nodes in to pin , where and . Denote the cost of by . Furthermore, let the set of pins be . The essence of the algorithm is given in Figure 7.9.
1. Begin with a subtree 2. Find a pin in

, say

of

consisting of a single pin and

.
(7.7)

, such that

Construct tree

by adding to the previous tree, i.e. set nodes in edges in and go to 2.

3. If

then stop, else

Figure 7.9: The shortest-paths heuristic. The number of pins is . As noted by Rayward-Smith and Clare [121], the nal shortest-paths tree can be improved by two additional steps.
4. Determine a minimum spanning tree for the sub-network of

induced by the nodes in .

5. Delete from this minimum spanning tree all non-pins of degree 1 in a sequential manner.

7.6 Global Routing Algorithms

147

Although the last two steps can improve the quality of the solution, it imposes a substantial increase in computation time. From experiments, we found that the improvement is usually negligible in our framework. The underlying reason is that the sub-network induced by the nodes of is not likely to contain better solutions if the pins lie relatively far from each other (in terms of intermediate nodes) and the number of alternative equally good paths is small. Generally, the shortest-paths heuristic yields good results [19, 122]. Furthermore, the worst-case computational complexity is

for a single net with pins. With the knowledge that for a planar graph the relationship holds, the worst-case computational complexity for all nets can be written as (7.8) Typically the complexity is signicantly lower because the addition of a new pin to the Steiner tree normally induces only a relatively small amount of nodes, proportional to the number of nodes in the shortest path segment, to be re-processed. Furthermore, the error ratio9 is , where is the number of pins in a net. Shortest-Paths-Based Heuristic I (SPBH I) The aforementioned shortest-paths heuristic does not add a pin to the currently built tree as soon as it nds a new pin. Instead, it makes sure that the path it adds to the current tree is indeed a shortest one over all possible paths from the current tree to this pin. A way to guarantee this condition is to postpone the addition of the currently found path to pin until all edges connected to have been explored, which implies that no improvement in path length is possible from the current tree to pin . We propose a modication of the aforementioned algorithm which entails adding a shortest path to a pin as soon as we encounter this pin. This is essentially a greedy approach. We name this algorithm the shortest-paths-based heuristic I (SPBH I). Furthermore, we observe that an implementation of the shortest-paths heuristic of Figure 7.9 typically uses a priority queue to store the candidate nodes before extracting the cheapest ones (one at a time) during the pin search. As proven by Huijbregts [123, Corollary 4.1], upon rst extraction of a candidate node from the queue to reach pin , the actual shortest path to is established through some node that resides in the queue which not necessarily is the rst extracted one to reach . Therefore, a greedy approach does not comply to the SPH condition of (7.7). In Figure 7.10 the algorithmic steps are shown which feature SPBH I. The description of the algorithm in Figure 7.10 is self-explanatory. It is clear that the computational complexity of SPBH I is never more than of the original shortest-paths heuristic. Figure 7.11 shows an example which demonstrates the different strategies of SPH and SPBH I. The purpose of this example is to show that the greedy behavior of SPBH I can yield worse solutions than the non-greedy behavior of SPH. However, as we will see from the experimental results, the
9 The

error-ratio is dened to be the quotient of the worst-case solution quality and the optimal solution quality.

148

Routing
1. Select an arbitrary pin from which to grow tree . Assign a value of 0 to the distance eld of and insert into the empty priority queue . (Every other node in has its distance eld set to and its status is .) .

2. Extract a node with minimum distance eld from the queue . Change the status of node to . Explore every node , which does not have status , adjacent to . For each then go to step 4, otherwise go to step 3. such node change the status to . If

4. Backtrack the path from node to by traversing the parent nodes. Meanwhile, add all traversed nodes and edges to . Set the distance eld of each traversed nodes to 0. . If then stop, else go to step 2.

3. If the distance eld associated with is larger than the distance eld of plus the length of edge , then relax the distance eld of to the latter sum. Tag as the parent node of and store node in the queue if it is not already there. Go to step 2.

Figure 7.10: The shortest-paths-based heuristic I. The number of pins in

is .

average solution quality of SPBH I is signicantly better than that of SPH for a wide range of problem instances. The explanation of Figure 7.11 is as follows. We want to connect all

36

13

3 49

12 44

39

36

13

3 49

12 44

39

(a) Example scenario

(b) Subsequent snapshot

36

13

3 49 56

12 44

39

36

13

3 49 52

12 44

39

(c) SPBH I result

(d) SPH result

Figure 7.11: The difference in routing strategy between SPH and SPBH I results in different routing solutions. Starting from the situation drawn in (a) and subsequently (b), this example demonstrates the case in which (c) SPBH I is outperformed by (d) SPH. solid black circles which constitute the pins of this net. The white circles are regular nodes. We start with pin and Figure 7.11(a) shows the snap shot of the situation in which pins and have just been processed and added to the shortest paths tree . Consequently, all pins and nodes in reside in the priority queue with their distance (key) values set to zero. Subsequently, the cheapest element is extracted from the queue and that node (or pin) is expanded. First, all nodes and pins in tree are extracted from the queue since they have a zero key value. During this process, pins and are also extracted and expanded. As a result, nodes and will be explored and put in the queue keyed by their distance from tree

7.6 Global Routing Algorithms

149

. Thus, node

has key 13 and node has key 5. This situation is shown in Figure 7.11(b) where also the backtracking arrows are drawn. Both, and only, nodes and reside in the queue at this moment with keys 13 and 5, respectively. Therefore, node is extracted and expanded. We nd pin at distance 56 from . Algorithm SPBH I stops at this point and returns the routing solution as depicted by thick lines in Figure 7.11(c). However, algorithm SPH will not stop once is encountered for the rst time. Instead, it will nd a shortest path via node because , where is the length of path -, and is the length of path -. Clearly, the routing solution found by algorithm SPH is better than the one found by algorithm SPBH I. Despite of this pessimistic scenerio, the average performance of SPBH I compared to SPH in terms of routing quality is remarkably good, as will be shown in Section 7.7. Shortest-Paths-Based Heuristic II (SPBH II) Another variant of the original shortest-paths heuristic is as follows. Instead of determining a closest pin to the current tree by means of exploring adjacent nodes in an iterative manner, we estimate the distance to a closest-distance pin using the Manhattan distance measure. We call this variant of the original shortest-paths heuristic the shortest-paths-based heuristic II (SPBH II). The algorithm is given in Figure 7.12. By we denote the minimal
1. Begin with a subtree 2.

of consisting of a single pin and . . Find a pin in , say , such that Construct tree by adding to the previous tree, i.e. set nodes in edges in

3. If

then stop, else

and go to 2.
is .

Figure 7.12: The shortest-paths-based heuristic II. The number of pins in

Manhattan distance between all pairs of nodes , . A signicant improvement in run-time is obtained if the point-to-point routing algorithm [100, 101] is used instead of Dijkstras point-to-multipoint algorithm [99]. The shortest-paths-based heuristic II is somewhat more expensive, with respect to computational complexity, than SPH although the algorithm is fundamentally faster than any other shortest-path algorithm. The underlying reason is that a straightforward implementation of the proposed algorithm processes all nodes in the currently built tree when evaluating . Consequently, the actual computational complexity is


evaluate

routing with Fibonacci heaps

(7.9)

which is clearly upper-bounded by

(7.10)

150

Routing

In case the pins are spread over the entire routing graph , we obtain a worst-case scenario. However, in most cases the routing region of interest can be conned and is therefore . Note that in virtually all cases . substantially smaller. This implies that Repetitive Shortest-Paths Heuristics For comparison purposes we also cover repetitive shortest-paths heuristics. A repetitive heuristic is not innovative in the sense that it makes use of a clever technique. On the contrary, it repeats a known heuristic over a given set of initial conditions. The solution of the repetitive heuristic is then the best solution returned by the underlying shortest-paths heuristic with a certain initial condition. Essentially, the class of repetitive variants is quite broad since in each heuristic some choice is made at some point in the algorithm. This can be an arbitrary choice, a greedy choice, or a choice based on some heuristic. Of course, a repetitive approach is only practical if a signicant improvement in routing quality is obtained at the cost of a (preferably small) increase in run-time performance. The trade-off between the two depends on the application. When naively implemented, the overall run-time of a repetitive shortest-paths heuristic grows linearly with the number of repetitions. Therefore, a substantial amount of effort must be spent on clever techniques that exploit the incremental philosophy; only re-compute information when it is strictly necessary. From our point of view, the foremost reason for studying repetitive heuristics is to assess the practical improvement in routing quality. If the improvements are substantial, it proves worthwhile to consider the design of an efcient repetitive heuristic. Winter and Smith [122] have proposed a class of repetitive shortest-paths heuristics and conducted extensive experiments with them. Summarizing, this class consists of the following variants:

SPH-N: determine SPH-V: determine

times, each time beginning with a different pin. times, each time beginning with a different node.

SPH-zN: determine times, each time beginning with a shortest path from . a xed pin to another pin , SPH-NN: determine between a different pair of pins.

times, each time beginning with a shortest path

The investigations of Winter and Smith showed that the SPH-V and SPH-NN perform particularly well on all instances from a large set of randomly generated problem instances. The quality improvements are of course paid for by longer computation times. A possible method to reduce computation time while preserving routing quality is by heuristic identication of a good start. This could for instance be accomplished by close (visual) examination of many practical routing results. However, this falls outside the scope of this thesis.

7.6 Global Routing Algorithms

151

7.6.5 Node-Based Routing


The major difculty when solving the Steiner minimal tree problem is to identify non-pins10 that must be included in the tree to arrive at an optimal solution. Once these non-pins are given, the Steiner minimal tree can be found easily; it is a minimum spanning tree of the subnetwork induced by the pins and selected non-pins. The general idea behind so-called node-based heuristics is to identify good non-pins quickly. This stands in contrast to the earlier described path-based heuristics, where the idea is to identify a good path quickly. The average-distance heuristic (ADH) is a node-based heuristic, which is discussed in this thesis for two reasons. First, from experiments conducted by various researchers, ADH seems to perform better than shortest-paths heuristics on various kinds of graphs [19]. Therefore, it is interesting to verify if this is also the case for our type of sparse (routing) graphs. Second, it is good to get an impression of how different types of routing heuristics differ in performance within our framework; both from the perspective of routing quality and computation times. Average-Distance Heuristic (ADH) The average-distance heuristic is a promising algorithm which was rst proposed by RaywardSmith [124]. The heuristic is given in Figure 7.13.
1. Begin with a subtree and . .

of

consisting of all single pins, i.e.

2. Determine for each , and select a node for which is minimal. The function measures the average distance between the unconnected subtrees in , denoted by , , and all nodes in . 3. Construct tree

and the second-closest subtree

by adding the path from the closest subtree in in to to the previous tree, i.e. set
nodes in

to ,

nodes in

4. If

edges in
and go to 2.

edges in

By this operation the subtrees

then stop, else

and are effectively joined into a single subtree

Figure 7.13: The average-distance heuristic. The number of pins in

is .

Conceptually, the algorithm works as follows. We start with a set of unconnected pins. The idea is now to constructively connect pins to each other such that the number of unconnected pins is decreased. Since we connect two pins during each iteration, after a nite number of iterations all pins are connected. ADH distinguishes itself in the choice of which subtrees, each containing at least one pin, to connect and how they should be connected. The averagedistance node is the node which has the smallest average distance to all currently constructed subtrees. After such a node is computed, the closest subtree and the second-closest subtree to that node are connected via two paths originating from the average-distance node. As a consequence of this step, both subtrees are merged into a single subtree connecting all pins it contains. The average-distance node is computed again, and the previous steps are repeated until all subtrees are connected.
10 Non-pins

are nodes which are not pins.

152

Routing

Two additional steps, identical to step 4 and 5 of the shortest-paths heuristic, can be applied to further improve the solution [121]. Formally, we can dene the average distance function by (7.11) where is the shortest path distance between node and the currently built subtree . Furthermore, is the iteration index as dened in the algorithm above, and is the total number of pins. A fast implementation of ADH is due to Chang and Lee [4]. Their main contribution is the identication of circumstances in which more than two subtrees can be joined together in a single iteration, hence reducing the total number of iterations. Nonetheless, the computational complexity of ADH is dominated by the evaluation of the average-distance function (7.11) during each iteration. It can be veried that ADH has computational complexity for planar graphs [121]. Note that this complexity is a function of the total (conned) routing graph size and not the number of pins. A signicant portion of this complexity can be attributed to the computation of shortest paths. Motivated by the urge to reduce computational complexity, we propose a modied version of ADH hereafter, which we call the average-distance-based heuristic (ADBH). Average-Distance-Based Heuristic (ADBH) The essential difference between the previously discussed ADH and the modied algorithm which we propose here, is the use of the Manhattan distance measure as used in

(7.12)

instead of the shortest-path distance measure used in (7.11). In addition, we use the algorithm to nd an actual shortest path between a source and a target node. By virtue of faster Manhattan distance approximation and the use of the algorithm to nd a shortest path, we can reduce the worst-case total computational complexity somewhat. This can be seen as follows. The time taken by the operation to nd the Manhattan distance between a node and a subtree is proportional to the number of nodes in this subtree. The maximum size of a subtree is of course never larger than the total number of nodes in the entire graph. Since all subtrees are disjunct, the total complexity for evaluating the distance from a given node to all subtrees is . Function is evaluated times for all nodes not in to all nodes in the subtrees. Furthermore, the addition of the best average-distance node and the (two) paths leading to that node takes in the worst case. This is done times. Consequently, the total computational complexity is . Note that and normally does not depend on . Because of this approximation it makes more sense to compare the results of ADBH with SPBH II instead of comparing it to the original SPH. As a nal remark, we note that the average distance of a node in only changes due to the change in distance to a merged (and extended) subtree . This fact can be exploited to yield a more efcient overall algorithm. However, this issue is not explored in this thesis.

7.7 Benchmarking of Heuristics in Our Routing Model

153

7.7 Benchmarking of Heuristics in Our Routing Model


In this section we compare the following routing heuristics with respect to computational complexity and routing quality11 :

minimal bounding box (MBB), shortest-paths heuristic (SPH), shortest-path-based heuristic I (SPBH I), shortest-path-based heuristic II (SPBH II), and average-distance-based heuristic (ADBH).

Since, worst-case performance of a heuristic can give an overly pessimistic indication of practical performance, and different heuristics perform differently on different problem instances, it is necessary to experimentally evaluate these heuristics on a set of representative problem instances. We rst dene the problem instances which we use to benchmark the routing heuristics. Then we evaluate these heuristics with respect to the following points:

solution cost, percentage deviation from the optimal solution cost, computation time.

From these values we can derive some implications with respect to the following issues: worst-case solution quality as implied by the error ratio versus practical solution quality; computational complexity versus practical performance;

Finally, we draw some conclusions.

7.7.1 Benchmark Problem Instances


From the point of truly integrated placement and global routing, a natural requirement on the performance of a routing heuristic is that it performs well (on average) on a routing graph derived from a a representative placement. During the initial phase of optimization, random placements are generated. Generally, a randomly generated placement is sparse, meaning that it contains a large amount of unoccupied space. It is quite plausible to assume that a routing graph derived from such a placement is certainly not easier to deal with than a routing graph derived from a very compact placement. For one, the graph is usually much larger. Hence, we may conclude that a heuristic which performs well (on average) on a broad
11 In fact, we have implemented and evaluated more heuristics but the given set is sufcient to show the essence of our results.

154

Routing

set of difcult graphs, will also perform well on the easier graphs which are generated during the nal stages of placement optimization. Thus, the results should give a good indication of typical routing performance. Table 7.1 shows the benchmark set of global routing graph instances we have dened, along with optimal routing solutions. The numbers shown in the shaded area are best known Table 7.1: Information on the set of generated routing graph benchmark instances, stating the number of nodes, the number of edges, and the number of pins in graph . Also the optimal solution values are shown in the optimal solution column. The numbers in the shaded area are best known upper bounds (and thus those problems have, as yet, unknown optimal solutions).
name lin01 lin02 lin03 lin04 lin05 lin06 lin07 lin08 lin09 lin10 lin11 lin12 lin13 lin14 lin15 lin16 lin17 lin18 lin19

optimal solution 80 82 84 266 269 274 526 530 532 540 1460 1462 1466 1472 1484 3633 3641 3646 3662 4 6 8 6 9 14 6 10 12 20 10 12 16 22 34 12 20 25 41 503 557 926 1239 1703 1348 1885 2248 2752 4132 4280 5250 4609 5824 7145 6618 8405 9714 13268

name lin20 lin21 lin22 lin23 lin24 lin25 lin26 lin27 lin28 lin29 lin30 lin31 lin32 lin33 lin34 lin35 lin36 lin37

optimal solution 6709 6717 6726 6750 14734 14743 14749 14753 14798 35636 35644 35653 35665 35730 71521 71533 71546 71657 11 20 28 52 16 24 30 36 81 24 31 40 53 117 34 45 58 172 6673 9143 10519 17560 15076 17803 21757 20678 32584 23765 27684 33248 41444 58017 46244 51996 57849 102733

53 55 57 157 160 165 307 311 313 321 816 818 822 828 840 1981 1989 1994 2010

3675 3683 3692 3716 7998 8007 8013 8017 8062 19083 19091 19100 19112 19177 38282 38294 38307 38418

upper bounds at the time of writing, while the other numbers are optimal values. The problem instances are derived from placements with as few as 10 modules (lin01, lin02, lin03) to placements with 2560 modules (lin34, lin35, lin36, lin37). To give the reader a visual impression of such a problem instance, a visualization of lin23 is shown in Figure 7.14.

7.7.2 Experimental Results


The purpose of experimental evaluation of the routing heuristics is to get a good idea of the practical performance and thus the usefulness of a certain heuristic in an iterative stochastic environment. The following ve heuristics are benchmarked: MBB, SPH, SPBH I, SPBH II, and ADBH. The MBB heuristic is the most widely used method to estimate wiring requirements in connection with iterative placement optimization. The algorithms have been implemented in C. The hardware platform is a contemporary Linux 2.4 operating system running on an Intel Pentium III 800MHz processor with

7.7 Benchmarking of Heuristics in Our Routing Model

155

Figure 7.14: An optimal solution to the routing problem instance lin23 consisting of 3716 nodes, 6750 edges and 52 pins. The total wire length is 17560. 512Mbytes of RAM. All computation times are measured using the getrusage() system call. Table 7.2 shows the experimental results on the previously dened set of global routing graph instances, of several routing heuristics. It is clear from these results that MBB routing is fastest of all, but the routing estimations it provides are disastrous; not only is the deviation from optimal very large, it also varies from -6% deviation to as much as -80% deviation. Also, we can see directly that ADBH performs very poorly, which is quite surprising. Because both run-time performance and solution quality are extremely poor for ADBH, we disregard it in our further discussion. We can also see that routing algorithm SPBH I performs best. Not only does it produce the highest quality results which are not more than 3% away from the optimum (on average), but also its computation times are very modest. Figure 7.15 shows the solution of algorithm SPBH I to problem instance lin23. This result should be compared with the optimal solution shown in Figure 7.14. It should be noted that the shown routing solution with length 18341 deviates 4.45% from optimal. However, this is barely assessable by visual inspection. The shaded rectangles are modules; 320 in total. Furthermore, the thin black lines are unexplored edges while the thin grey lines are explored edges. We can see that the right side of the plot contains a vertical region which has been left unexplored by the search wave. Based on these results which suggest SPBH I as a most promising routing heuristic, a

156

name time 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 1.00e-4 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 0.00e+0 1.00e-4 0.00e+0 1.00e-4 0.00e+0 0.00e+0 0.00e+0 8.11e-06 5.70 6.48e-02 2.81 6.16e-02 13.69 2.58e-02 80.51 1.26e+02 503 557 935 1267 1813 1393 1897 2649 3107 4773 4323 5314 4643 6971 7350 6688 9552 10414 14056 6968 9574 11337 19564 17655 19150 22480 22077 34582 24511 28940 34645 41606 61400 48368 52766 58853 105211 0.00 0.00 0.97 2.26 6.46 3.34 0.64 17.84 12.90 15.51 1.00 1.22 0.74 19.69 2.87 1.06 13.65 7.21 5.94 4.42 4.71 7.78 11.41 17.11 7.57 3.32 6.77 6.13 3.14 4.54 4.20 0.39 5.83 4.59 1.48 1.74 2.41 1.00e-4 2.00e-4 2.00e-4 5.00e-4 7.00e-4 6.00e-4 9.00e-4 1.10e-3 1.10e-3 1.40e-3 3.80e-3 4.10e-3 2.70e-3 4.00e-3 3.80e-3 1.08e-2 1.31e-2 1.26e-2 1.33e-2 1.90e-2 2.03e-2 2.07e-2 2.51e-2 5.93e-2 5.47e-2 5.84e-2 5.55e-2 5.76e-2 1.31e-1 1.41e-1 1.37e-1 1.44e-1 1.42e-1 3.16e-1 2.97e-1 3.04e-1 3.41e-1 503 557 932 1239 1770 1348 1897 2280 2785 4294 4335 5354 4827 6020 7345 6865 9382 10665 13777 6808 9430 10898 18341 15929 18824 22516 21387 33642 24835 28780 33293 41544 58173 47810 52766 57849 103122 0.00 0.00 0.65 0.00 3.93 0.00 0.64 1.42 1.20 3.92 1.29 1.98 4.73 3.37 2.80 3.73 11.62 9.79 3.84 2.02 3.14 3.60 4.45 5.66 5.73 3.49 3.43 3.25 4.50 3.96 0.14 0.24 0.27 3.39 1.48 0.00 0.38 1.00e-4 1.00e-4 2.00e-4 4.00e-4 5.00e-4 4.00e-4 1.00e-3 9.00e-4 1.00e-3 1.20e-3 3.20e-3 3.80e-3 2.80e-3 3.70e-3 3.80e-3 1.16e-2 1.36e-2 1.23e-2 1.26e-2 1.81e-2 1.81e-2 1.99e-2 2.65e-2 6.04e-2 5.05e-2 5.58e-2 5.28e-2 5.32e-2 1.29e-1 1.33e-1 1.32e-1 1.39e-1 1.36e-1 3.17e-1 2.95e-1 2.70e-1 2.99e-1 603 606 953 1371 1849 1579 1941 2347 3199 4463 4484 5587 5119 6517 9040 7221 9297 11765 15224 8100 10767 12081 20481 16694 21641 26353 24112 39125 27612 32506 37798 47818 65896 51373 61207 62596 119808 19.88 8.80 2.92 10.65 8.57 17.14 2.97 4.40 16.24 8.01 4.77 6.42 11.07 11.90 26.52 9.11 10.61 21.11 14.74 21.38 17.76 14.85 16.63 10.73 21.56 21.12 16.61 20.07 16.19 17.42 13.69 15.38 13.58 11.09 17.71 8.21 16.62 0.00e+0 2.00e-4 2.00e-4 2.00e-4 3.00e-4 5.00e-4 4.00e-4 5.00e-4 7.00e-4 9.00e-4 9.00e-4 8.00e-4 1.20e-3 1.60e-3 2.60e-3 1.50e-3 2.30e-3 3.60e-3 5.60e-3 3.00e-3 3.70e-3 4.40e-3 1.24e-2 7.60e-3 1.28e-2 1.21e-2 1.09e-2 4.01e-2 1.63e-2 1.81e-2 2.89e-2 3.47e-2 1.16e-1 3.44e-2 4.27e-2 5.45e-2 4.79e-1 503 606 1113 1780 1957 1476 2079 2711 3931 6202 6237 6379 6339 10385 12062 9250 12849 16497 26698 10062 14096 19339 42019 20017 34120 44279 44365 99705 46332 66750 68879 100909 169658 105095 123109 159935 355800 0.00 8.80 20.19 43.66 14.91 9.50 10.29 20.60 42.84 50.10 45.72 21.50 37.54 78.31 68.82 39.77 52.87 69.83 101.22 50.79 54.17 83.85 139.29 32.77 91.65 103.52 114.55 205.99 94.96 141.11 107.17 143.48 192.43 127.26 136.77 176.47 246.33 1.00e-4 4.00e-4 8.00e-4 1.00e-3 1.90e-3 3.40e-3 1.70e-3 4.30e-3 6.20e-3 1.53e-2 1.33e-2 1.72e-2 2.66e-2 5.85e-2 1.25e-1 5.61e-2 1.27e-1 2.22e-1 5.97e-1 9.16e-2 2.88e-1 5.87e-1 2.37e+0 4.69e-1 1.36e+0 2.24e+0 2.88e+0 1.56e+1 4.16e+0 7.28e+0 1.13e+1 2.16e+1 2.34e+2 3.79e+1 5.58e+1 1.60e+2 4.11e+3 cost time cost time cost time cost time

cost

MBB % opt

SPH % opt

SPBH I % opt

SPBH II % opt

ADBH % opt

lin01 lin02 lin03 lin04 lin05 lin06 lin07 lin08 lin09 lin10 lin11 lin12 lin13 lin14 lin15 lin16 lin17 lin18 lin19 lin20 lin21 lin22 lin23 lin24 lin25 lin26 lin27 lin28 lin29 lin30 lin31 lin32 lin33 lin34 lin35 lin36 lin37

475 515 560 975 1208 831 1328 1857 1695 2143 2730 3240 2741 3222 3330 4323 4526 4680 5581 4624 5360 5043 6668 9072 9428 10066 9406 10198 11904 14047 12984 14917 15975 21689 20189 19053 21420

-5.57 -7.54 -39.52 -21.31 -29.07 -38.35 -29.55 -17.39 -38.41 -48.14 -36.21 -38.29 -40.53 -44.68 -53.39 -34.68 -46.15 -51.82 -57.94 -30.71 -41.38 -52.06 -62.03 -39.82 -47.04 -53.73 -54.51 -68.70 -49.91 -49.26 -60.95 -64.01 -72.46 -53.10 -61.17 -67.06 -79.15

Table 7.2: Experimental evaluation results of several routing heuristics. All times are in CPU seconds of a Linux 2.4 operating system running on an Intel Pentium PIII 800MHz system. The CPU times are averaged over 100 runs. The 0.00e+0 times (assumed equal to zero) were too small to measure.

average

-45.56

Routing

few additional experimental investigations are conducted. We tested the following variations of SPBH I:

7.7 Benchmarking of Heuristics in Our Routing Model

157

Figure 7.15: A near-optimal SPBH I solution to the routing problem instance lin23 consisting of 3716 nodes, 6750 edges and 52 pins. The total wire length is 18341 which deviates 4.45% from the optimal solution shown in Figure 7.14.

ISPBH I ZZ: iterated version of SPBH I but instead of starting with an arbitrary pin, we start with a shortest path between all different pair of pins times; ISPBH I Z: iterated version of SPBH I but instead of starting with an arbitrary pin, we start times with a different pin; ISPBH I Z BIAS: iterated version of SPBH I Z but whenever an arbitrary decision needs to be taken, this decision is biased towards extending an edge in the direction of the center of gravity of the net.

Table 7.3 shows the results of the experiments conducted with these algorithms. For the moment, ignore the ISPBH I Z BIAS results. Shortly, we will explain why. It is clear from the results in this table that ISPBH I ZZ produces overall the best results with 1.4% deviation from the optimum on average. However, the computation times of ISPBH I ZZ are quite large, increasing rapidly with larger nets. Therefore, the faster ISPBH I Z is more suitable for use in an iterative framework since the differences in routing quality are not that large. An average improvement over ISPBH I Z is obtained by ISPBH I Z BIAS without additional computational overhead, by exploiting biasing information where normally arbitrary decisions are made by the algorithms. This biasing technique can also be applied to ISPBH I ZZ

158

Routing

to improve the solution cost slightly, without increasing computation time. However, the computation time of ISPBH I ZZ is too large for the algorithm to be practical, anyway. We see that the solution quality of ISPBH I Z BIAS is comparable with ISPBH I ZZ while the computation time is two orders of magnitude lower. Table 7.3: Experimental routing results obtained with variations on algorithm SPBH I. All times are in CPU seconds of a Linux 2.4 operating system running on an Intel Pentium PIII 800MHz system. The CPU times are averaged over 10 runs.
name lin01 lin02 lin03 lin04 lin05 lin06 lin07 lin08 lin09 lin10 lin11 lin12 lin13 lin14 lin15 lin16 lin17 lin18 lin19 lin20 lin21 lin22 lin23 lin24 lin25 lin26 lin27 lin28 lin29 lin30 lin31 lin32 lin33 lin34 lin35 lin36 lin37 average SPBH I cost 503 557 932 1239 1770 1348 1897 2280 2785 4294 4335 5354 4827 6020 7345 6865 9382 10665 13777 6808 9430 10898 18341 15929 18824 22516 21387 33642 24835 28780 33293 41544 58173 47810 52766 57849 103122 cost 503 557 926 1239 1703 1348 1885 2252 2752 4248 4289 5301 4631 5981 7256 6696 8550 10044 13562 6677 9372 10726 17983 15518 18615 22358 21427 33583 24339 28463 33516 41541 58194 46886 51996 58386 102781 ISPBH I ZZ % opt 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.18 0.00 2.81 0.21 0.97 0.48 2.70 1.55 1.18 1.73 3.40 2.22 0.06 2.50 1.97 2.41 2.93 4.56 2.76 3.62 3.07 2.42 2.81 0.81 0.23 0.31 1.39 0.00 0.93 0.05 1.36 time 1.00e-3 4.00e-3 8.00e-3 1.20e-2 3.60e-2 6.30e-2 2.40e-2 7.80e-2 1.17e-1 4.01e-1 3.48e-1 5.81e-1 6.85e-1 1.55e+0 3.62e+0 1.53e+0 4.75e+0 7.61e+0 2.26e+1 2.15e+0 7.73e+0 1.51e+1 6.66e+1 1.49e+1 3.11e+1 5.43e+1 7.51e+1 3.91e+2 7.50e+1 1.38e+2 2.32e+2 4.38e+2 1.90e+3 3.57e+2 5.97e+2 1.00e+3 9.86e+3 4.13e+02 cost 503 557 926 1267 1709 1348 1897 2252 2785 4256 4289 5314 4631 6056 7319 6688 8718 10133 13572 6887 9372 10758 18079 15856 18803 22430 21569 33655 24339 28517 33668 41516 58454 47300 52134 58531 102710 ISPBH I Z % opt 0.00 0.00 0.00 2.26 0.35 0.00 0.64 0.18 1.20 3.00 0.21 1.22 0.48 3.98 2.44 1.06 3.72 4.31 2.29 3.21 2.50 2.27 2.96 5.17 5.62 3.09 4.31 3.29 2.42 3.01 1.26 0.17 0.75 2.28 0.27 1.18 -0.02 1.92 time 1.00e-3 0.00e+0 1.00e-3 2.00e-3 5.00e-3 5.00e-3 5.00e-3 9.00e-3 1.00e-2 2.00e-2 3.50e-2 4.70e-2 4.50e-2 7.50e-2 1.13e-1 1.39e-1 2.40e-1 3.13e-1 5.54e-1 2.26e-1 4.13e-1 6.03e-1 1.33e+0 1.04e+0 1.38e+0 1.91e+0 1.92e+0 4.26e+0 3.12e+0 4.29e+0 5.67e+0 7.89e+0 1.67e+1 1.16e+1 1.43e+1 1.71e+1 5.57e+1 4.08e+00 cost 503 557 926 1239 1703 1348 1885 2248 2785 4204 4287 5314 4631 6049 7295 6664 8586 10184 13721 6717 9376 10675 18029 15519 18426 22218 21240 33663 24329 28601 33248 41444 58017 46244 52651 57960 102733 ISPBH I Z BIAS % opt 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.20 1.74 0.16 1.22 0.48 3.86 2.10 0.70 2.15 4.84 3.41 0.66 2.55 1.48 2.67 2.94 3.50 2.12 2.72 3.31 2.37 3.31 0.00 0.00 0.00 0.00 1.26 0.19 0.00 1.38 time 0.00e+0 1.00e-3 1.00e-3 2.00e-3 5.00e-3 5.00e-3 5.00e-3 9.00e-3 1.10e-2 2.00e-2 3.40e-2 4.40e-2 4.40e-2 7.80e-2 1.06e-1 1.41e-1 2.51e-1 3.13e-1 5.53e-1 2.22e-1 3.99e-1 6.05e-1 1.30e+0 1.05e+0 1.35e+0 1.89e+0 2.13e+0 4.83e+0 3.40e+0 4.61e+0 5.97e+0 8.30e+0 1.72e+1 1.18e+1 1.48e+1 1.74e+1 5.62e+1 4.19e+00

Summarizing, SPBH I and ISPBH I Z BIAS are the most promising candidates for routing in an iterative optimization framework.12 It depends, among others, on the typical size of a net whether or not it is worth trading off computation time with solution quality. For comparison purposes it is interesting to contrast the heuristic graph SMT results with optimal rectilinear SMT (RSMT) and Euclidean SMT (ESMT) results. Recall that the RSMT and ESMT13 solutions ignore modules in the plane. Consequently, wires can run over modules which per denition is undesirable. However, the results do give a good indication of how much solution quality we lose by imposing a non-over-the-cell-routing constraint. The optimal results have been obtained using Geosteiner 3.0 written by Warme et al. [125] which is considered a state-of-the-art tool for computing RSMTs and ESMTs. Table 7.4 summarizes the outcomes of the experiments which are performed on the same hardware platform as the other routing experiments. The average improvement of RSMT solutions over near12 In principle, it is also possible to apply biasing techniques to SPBH I, thereby improving solution quality while not enlarging computation time. 13 The ESMT is similar to the RSMT except for the fact that edges are not restricted to horizontal and vertical directions.

7.7 Benchmarking of Heuristics in Our Routing Model

159

optimal heuristic ISPBH I Z BIAS solutions is 6.6%. This means that RSMT lengths are about 5% shorter than graph SMT lengths. Of course, ESMT improves on these results. Note that the CPU times of RSMT and ESMT are orders of magnitude smaller than the CPU of the ISPBH I Z BIAS heuristic and comparable with the CPU times of the SPBH I heuristic. Table 7.4: Experimental routing results of the RSMT and ESMT problems obtained using the Geosteiner 3.0 tool. All times are in CPU seconds of a Linux 2.4 operating system running on an Intel Pentium PIII 800MHz system. The columns headed by % dev. represent the deviation of the solution cost with respect to the ISPBH I Z BIAS solution and not related to the optimal solution.
name lin01 lin02 lin03 lin04 lin05 lin06 lin07 lin08 lin09 lin10 lin11 lin12 lin13 lin14 lin15 lin16 lin17 lin18 lin19 lin20 lin21 lin22 lin23 lin24 lin25 lin26 lin27 lin28 lin29 lin30 lin31 lin32 lin33 lin34 lin35 lin36 lin37 average ISPBH I Z BIAS cost time 503 557 926 1239 1703 1348 1885 2248 2785 4204 4287 5314 4631 6049 7295 6664 8586 10184 13721 6717 9376 10675 18029 15519 18426 22218 21240 33663 24329 28601 33248 41444 58017 46244 52651 57960 102733 0.00e+0 1.00e-3 1.00e-3 2.00e-3 5.00e-3 5.00e-3 5.00e-3 9.00e-3 1.10e-2 2.00e-2 3.40e-2 4.40e-2 4.40e-2 7.80e-2 1.06e-1 1.41e-1 2.51e-1 3.13e-1 5.53e-1 2.22e-1 3.99e-1 6.05e-1 1.30e+0 1.05e+0 1.35e+0 1.89e+0 2.13e+0 4.83e+0 3.40e+0 4.61e+0 5.97e+0 8.30e+0 1.72e+1 1.18e+1 1.48e+1 1.74e+1 5.62e+1 4.19e+00 cost 501 557 831 1089 1550 1286 1852 2072 2575 3789 4108 4986 4282 5637 6794 6374 8102 9155 12664 6512 8611 10034 16864 14682 16730 21214 19854 31158 22894 26796 30430 38351 53874 42953 48865 53390 95217 RSMT % dev. -0.40 0.00 -10.26 -12.11 -8.98 -4.60 -1.75 -7.83 -7.54 -9.87 -4.18 -6.17 -7.54 -6.81 -6.87 -4.35 -5.64 -10.10 -7.70 -3.05 -8.16 -6.00 -6.46 -5.39 -9.20 -4.52 -6.53 -7.44 -5.90 -6.31 -8.48 -7.46 -7.14 -7.12 -7.19 -7.88 -7.32 -6.60 time 0.00 0.00 0.00 0.01 0.01 0.01 0.00 0.00 0.00 0.02 0.00 0.01 0.00 0.00 0.01 0.00 0.01 0.01 0.03 0.01 0.00 0.01 0.12 0.01 0.00 0.02 0.07 0.62 0.01 0.01 0.02 0.23 0.80 0.02 0.03 0.14 1.44 9.95e-02 cost 430 466 711 1003 1421 1150 1618 1861 2283 3230 3475 4460 3804 4719 5996 5542 7117 7929 11271 5669 7602 8797 14720 12812 14603 18494 17584 27127 20161 22989 26690 33186 46723 38553 41635 47118 82758 ESMT % dev. -14.51 -16.34 -23.22 -19.05 -16.56 -14.69 -14.16 -17.22 -18.03 -23.17 -18.94 -16.07 -17.86 -21.99 -17.81 -16.84 -17.11 -22.14 -17.86 -15.60 -18.92 -17.59 -18.35 -17.44 -20.75 -16.76 -17.21 -19.42 -17.13 -19.62 -19.72 -19.93 -19.47 -16.63 -20.92 -18.71 -19.44 -18.30 time 0.00 0.00 0.01 0.03 0.02 0.20 0.02 0.03 0.06 0.16 0.05 0.08 0.10 0.11 0.43 0.03 0.19 0.39 1.70 0.04 0.11 0.26 1.57 0.19 0.16 0.46 1.90 4.91 0.17 0.44 1.02 1.41 7.63 0.95 0.85 2.12 7.66 9.58e-01

7.7.3 Concluding Remarks


We have shown that efcient graph-based routing heuristics are suitable for nding near optimal approximations to the graph Steiner minimal tree problem. The best heuristics in terms of solution quality and running time are SPBH I and ISPBH I Z BIAS. The former can be improved somewhat by using a biasing technique similar to the latter. Since SPBH is or-

160

Routing

ders of magnitude faster than ISPBH I Z BIAS it is certainly more suitable during the initial phase of simulated annealing optimization. Another issue worth investigating with respect to improvement of run-time performance, is the idea of multiple wave expansion as elaborated in [123] but explored in a somewhat different context. We already pointed out that the Geosteiner 3.0 tool produces optimal solutions much faster than a heuristic produces sub-optimal solutions. This seems paradoxal but it should be noted that the Geosteiner 3.0 is strongly optimized while our heuristic code is not. However, Geosteiner 3.0 cannot compute optimal Steiner minimal trees in graphs which is an important requirement in our framework. As yet, no previous published results are known on fast heuristics for nding near-optimal Steiner minimal trees in graphs derived from actual module placements. Moreover, little was known about their absolute performance in relation with optimal solutions. Our work on routing has lled this gap. Last but not least, a rigorous optimization of the heuristic code should signicantly improve run-time performance, resulting in a very fast global routing heuristic that yields nearoptimal results.

7.8 Incremental Routing


As mentioned earlier, incremental update techniques are of paramount importance in an iterative optimization environment where the number of iterations can be very large. It is clear that when only a small change in the placement of modules occurs, at least the nets connected to the modules that actually changed location have to be re-computed. Also the nets that are routed through the region of affected modules should be re-computed in order to have consistently good global routes for all nets. Thus, the total set of affected nets constitutes of two subsets. We have a subset of nets which have at least one pin connected to any of the moved modules. This set is implicitly identied via computation of all moved modules. The second subset contains nets which have no pin connected to any of the moved modules, but have a routing segment running through the region of moved modules. We cover these cases separately in the following subsections.

7.8.1 Re-routing Nets Connected to Moved Modules


When a module moves due to a perturbation operation, the global routing graph is always affected. Thus, it has to be updated. The next step is to determine which nets have to be re-routed. It is quite obvious that all nets connected to all moved modules need a routing update. A straightforward incremental algorithm for doing this is as follows. 1. Enumerate all nets that have to be considered for a routing update by traversing all moved modules. 2. For each enumerated net re-compute the global routing using a pre-dened routing heuristic. Since we want to compute the total wire-length of all nets in an incremental fashion, the above algorithm should be supplemented with the following step.

7.8 Incremental Routing

161

3. Incrementally update the total wire-length by pre-subtracting the length of the net before the perturbation, and adding the length of the re-computed routing of that net after the perturbation. Unfortunately, a nasty problem arises which is not easily discovered. The underlying reason of this problem is given next. In all of our global routing heuristics we use a priority queue to store candidate nodes for expansion. One property of priority queues that has been left untouched up to now, is the action that has to be taken when multiple elements with the same key exist in the queue. We assume, as is the default in these cases, that when the priority queue has to decide which element to choose from a set of elements with equal keys, it will make an arbitrary choice among these elements. Generally, there is no reason to deviate from this (usually) implicit assumption. However, in the present context we can easily sketch a scenario in which an arbitrary choice is unwanted. Figure 7.16(a) shows a part of a global routing graph in which the three pins , , and have to be routed. Pin is the start pin (Figure 7.16(a)). After a few steps, we arrive at node which is explored and put in the priority queue. When the algorithm extracts node from the queue for expansion, it will nd node , relax it and put it in the queue, keyed by the distance from pin . Node is found subsequently, relaxed, and put in the queue, keyed by the distance from pin . Since edge has the same length as edge , nodes and reside in the queue with the same key value, say . When is the smallest key in the queue, and an extract min queue operation is issued, it is not clear a priori whether element or element will be returned. Since we have implemented the priority queue with a splay tree (see Chapter 5), the priority queue is fully deterministic. Therefore, even though we cannot predict which of the elements and is going to be extracted rst, when the priority is built up using the same sequence of elements and operations, the choice between and is arbitrary but static. Figure 7.16(b) shows that in the case where is extracted rst from the priority queue, node will become the parent node of node because node is explored rst via node . This will eventually result in the routing of the three-pin net as shown in Figure 7.16(c), because when pin is found after expanding node , the backtrack pointers (shown as arrows in the gure) go via nodes , , , in that order. Pin is then connected to the routing tree using the traversed edges. On the other hand, when the priority queue is built up using a different sequence of elements and operations, for example due to the presence of a module , it is possible that instead of element , element is extracted rst (even if both distance values relative to are equal again). This eventually results in the routing solution as shown in Figure 7.16(d). Due to the importance of this observation, we formulate it in the following theorem. Theorem 9 The exact topology of a balanced search tree, such as the splay tree (see Chapter 5), depends on both the order and the total set of performed tree operations and tree elements. Note that also the addition of a single element, directly followed by the deletion of that element, results in a different sequence and thus a possibly different topology of the tree structure. The scenario in which a node is added to an already existing routing tree is easily conceivable. For instance, when a module changes position, its associated escape lines can induce new edges and nodes in the routing graph. It is important to note that this might occur without re-routing the routing tree, simply because of the fact that we know in advance that

162

Routing

(a)

(b)

(c)

(d)

Figure 7.16: An example scenario which demonstrates that even in the case of a deterministic priority queue which is used in our routing heuristic, the routing result depends on the environment of the modules that are of direct interest for the net to be routed. In this example a difference in extraction order of nodes and (keyed with the same distance value) yields two routing results with different lengths (thick lines); (c) versus (d). the routing tree should not change (the routing tree does not run through the affected region, nor is it directly connected to any of the affected modules). Essentially, the arbitrary breaking of ties (for elements with the same key in the queue), causes this unwanted behavior of the routing algorithm. Although, the tie-breaking choice does not have any obvious preference at that specic moment, it is likely to affect the nal outcome of the routing solution (and consequently, the routing length). The choices we have to break a tie are:

choose the node, independent of the contents and structure of the balanced search tree;

7.8 Incremental Routing

163

choose the node, dependent of the contents and structure of the balanced search tree.

Clearly, we must select the rst choice. A practical implementation could be based on breaking a tie by choosing the node closest to the center of gravity of all pins in the net. For clarity, we show next what will happen when a naive tie-breaking approach is adopted. It is clear from the previous discussion that the routing result of a net depends on the environment around the region dened by the modules connected to that net. Therefore, incremental re-routing is badly affected due to the fact that the total wire-length obtained by incremental means can deviate without bound from the real total wire-length. This can be seen from the algorithm shown in Figure 7.17. It is clear that essentially the operations
1 while do 2 perturb 3 4 compute new 5 6 if 7 then 8 undo perturb 9 10 compute new 11 12 13 od

Figure 7.17: A simplied simulated-annealing-based algorithm which we use to demonstrate the effect of environment-dependent re-routing results. stands for wirelength and , are integers in the range , . Furthermore, we only consider a single net with length . on lines 3 to 5 are equal to the operations performed on lines 9 to 11. Using an equivalent representation for lines 3 to 5, in the form of

it can be easily seen that we have essentially , which can be written as . Consequently, considering a single generation-rejection scenario, always evaluates to true, the algorithm of Figure 7.17 can be simplied to i.e. the algorithm shown in Figure 7.18. From this simple algorithm we can directly see that in
1 while 2 3 4 od

do
Figure 7.18: Simplied generation-rejection algorithm.

should hold. Consequently, the case of an ideal generation-rejection operation, in effect, nothing happens to when the loop is iterated. However, from the previous

164

Routing

discussion we know that the routing heuristic can cause . It is evident that can start drifting uncontrollably. Hence, this unwanted effect renders the optimization algorithm useless when straightforward incremental routing techniques are applied. The previously discussed solution to resolve environment-dependent routing results is to use some sort of biasing technique which will decide in a predictable and balanced-treestructure-independent manner in case of ties. A biasing technique which will give good routing results is preferred of course. However, it is difcult to measure the quality of such a technique in a general context. An implementation of this technique could be as follows.
1. Extract all equal-keyed nodes from the queue and put them in a separate data structure . Then compute a unique center of gravity which will be the reference point for the net in the current topology. 2. Choose the node from with smallest - or -distance from . In case of ties, choose the node with smallest -coordinate rst. If there is still a tie, choose the node with smallest -coordinate to resolve all ties.

Actually, is not strictly necessary for an efcient implementation. We can simply process the extracted nodes sequentially (in any order) and decide to adjust the parent node of an explored node with distance equal to the current exploration distance. The nal decision is based on the criteria mentioned in step 2 of the previous algorithm. The validity of the sequential approach is a direct consequence of the associativity property of the mathematical -operator. This method can be simplied by discarding the center of gravity information and assigning xed priorities to each of the four edges departing from a node; the edge with higher priority will always have preference. Since there is no good reason to assume that the more sophisticated approach will perform better in practice in terms of routing quality, we choose for the simplest approach which is also the fastest.

7.8.2 Re-routing Affected Nets Not Connected to Moved Modules


Unfortunately, moved modules do not only trigger re-routing of the nets directly connected to them, but also nets that are routed through the region containing moved modules. There are two issues that play a signicant role here. 1. Identication of all affected nets, with focus on the nets that are routed through the region with moved modules without being connected to any of the moved modules. 2. Efcient re-routing of these nets such that the quality of the obtained routing is not affected in a negative sense. Clearly, when a routing segment runs along a side of a moved module, it should be considered for re-routing after moving the module. We propose two approaches to enumerate these nets efciently.

Enumerate all nets that run along any of the sides of the moved modules. Since we accumulated all global routing information and assigned this information to appropriate module boundaries in a previous iteration, it is a straightforward task to perform the enumeration without incurring additional computational complexity.

7.8 Incremental Routing

165

Take only into account the modules at the perimeter of the affected region, which is the region containing the moved modules. If a routing segment of a net runs into this affected region, then that segment needs re-routing. Since each affected net that is not connected to any of the moved modules must cross this perimeter, it is sufcient to process all perimeter modules.

A clear drawback of the rst method is that all moved modules must be processed in order to nd all affected nets. When the number of moved modules becomes larger, it will become more advantageous to consider solely the perimeter modules, since the number of perimeter modules will grow roughly proportionally with the square root of the number of moved modules. Furthermore, the second method immediately identies the exact boundary locations of the routing segments that penetrate the affected region. The usefulness of knowing these boundary locations will be made clear shortly. As a result of the previous discussion, the second method is preferred over the rst method.

Full Re-routing of Indirectly Affected Nets With respect to the nets that cross the affected region dened by the moved modules, the simplest way to recompute the routing of a net is to compute the entire routing again. Using this approach, a high-quality routing is generally maintained at the expense of higher computational complexity. In case of large nets with many pins, this approach may burden the overall algorithm to a large extent, since most of the pins and thus the largest part of the routing segments lie in the unaffected region.

Partial Re-routing of Indirectly Affected Nets A method to reduce the computational overhead of full re-routing of affected nets is through the use of partial re-routing. Essentially, only that part of the routing is re-computed that lies in the affected region. Besides a benecial reduction in computational complexity for larger nets, there is an additional advantage with respect to the quality of a net. If the boundary crossings of the routing segments of a specic net are considered as virtual pins, we can actually obtain a gain in routing quality when we connect the virtual pins in a Steinerminimal-tree-like manner. However, when naively viewing all virtual boundary pins induced by a net as pins of a single subnet which needs to be routed, a lurking danger is the introduction of loops in the interconnect of the total net. Clearly, measures should be taken to avoid the occurrence of loops, since this is unwanted by denition.14 A way to solve this problem is by keeping track of subsets of boundary pins that are disconnected due to the removal of routing segments in the affected region. Consequently, partial re-routing of affected nets can be favourable over full re-routing, especially if the partial portion is relatively small compared to the total routing of the net.
14 Under some circumstances it might actually be desirable to introduce loops, motivated by electromagnetic considerations, but this topic is outside the scope of this thesis.

166

Routing

7.9 Impact of Routing on Placement Quality


Both placement and routing are NP-hard problems when considered separately. When these problems are combined, as we should do, because of their strong interdependencies, nding a solution does certainly not become easier. Therefore, many researchers tend to neglect or oversimplify part of the problems involved. It is interesting to investigate whether or not these simplications are justied and how much we have to pay for these simplications in terms of solution quality. In this section, it is shown via experimental results that the de-facto standard way of estimating wire length which is called the minimal bounding box (MBB) method, does not result in high-quality placements. Moreover, the use of a more accurate routing heuristic yields signicantly better nal (global) routing results.

7.9.1 Integrated Placement and Routing


The integrated placement and global routing idea has been integrated into a robust simulated annealing optimization framework. The main idea of the overall algorithm is shown in Figure 7.19.
Input: netlist and set of blocks with given sizes Output: a (near) optimal solution .

random initial state

while do

generate compute packing compute chip area compute routing graph

foreach do

insert pins of net estimate route

od

if random then

od

adjust temperature

Figure 7.19: A simplied simulated annealing algorithm with integrated placement and global routing optimization. With the previous discussion of the construction of the global routing graph and the extended global routing graph , the above algorithm speaks for itself. Based on former experimental results with respect to the routing heuristics, we have chosen for the shortestpaths-based heuristic I (SPBH I) which gives near-optimal routing results quickly. This

7.9 Impact of Routing on Placement Quality

167

heuristic is used in conjunction with a non-incremental placement computation algorithm. Our main purpose is to show the impact of routing quality on placement quality when both are weighted equally, i.e. , in an optimization environment. We have not put any effort into minimizing run-time performance other than the most obvious implementation choices.

7.9.2 Experimental Results


We have implemented the integrated placement and routing optimization framework in C based on [48, 35]. The platform is SuSE Linux 7.0, running on PIII@800MHz with 512Mb RAM. We use the best solution out of three runs with different random seeds, unless noted otherwise. A set of randomly generated problem instances is used, i.e. random net lists with at most one pin per block side, and random module sizes from a given range. The largest MCNC benchmark ami49 has been included, too [93]. The primary purpose of including a commonly used benchmark, is to set a reference level for comparison with existing works. Note that the MCNC benchmark data has been adapted slightly to be comparable with existing results and to make placement optimization easier because we do not have to put additional boundary constraints on any module. The reason for not including more MCNC benchmarks is that we do not allow over-the-cell routing and thus cannot handle the in-cell pins of some MCNC benchmarks. The experimental setup is as follows. 1. We search for a (near) optimal packing without considering routing. This is mainly for comparison purposes with existing results and to show the impact of routing on placement quality. 2. We search for a (near) optimal packing using MBB routing. For each nal packing, we compute the SPBH I solution and check if this is consistent with the MBB estimation. 3. We search for a (near) optimal packing using SPBH I routing. For each nal packing, we compute the MBB solution and check if it is consistent. Table 7.5 shows our main results on the randomly generated benchmark data and the MCNC benchmark. It is interesting to note that the nal chip area of packing optimization without routing for ami49, being 36.48 mm , is better than any previously reported result [69, 62, 47]. This implies that the proposed optimization framework has excellent convergence behavior, resulting in near-optimal solutions in reasonable time without any tuning. Note that the intention of the experiments is to show the impact of routing quality on placement quality. Therefore, no efforts have been devoted to minimizing computation times. The attention should be focused on relative differences. Let us take a closer look at the last two columns of Table 7.5. The results in column MBB are the nal total wire-length values measured by the MBB estimation method. The results in column SPBH I are the nal total wirelength values measured by algorithm SPBH I. We clearly see a trivial difference when we compare the upper six values with the lower six values in the MBB column. This also holds for the values in the last column. A most remarkable observation is the fact that in almost all cases an increase in MBB value occurs when a decrease in SPBH I value is obtained. Consequently, we may conclude that the correlation between MBB and the more accurate SPBH I global routing is very poor. Moreover, MBB routing used in the optimization loop does not lead to a placement which will be routable with a minimum amount of wire length

168

Routing

Table 7.5: Placement optimization results on a set of problem instances. All values are best of three runs. The parameters and denote the chip area weight and the total wire length weight, respectively (see (4.3)).


20 40 80 160 no routing 320 49 (ami49) CPU time [s] 14.18 161.3 1537 5687 20996 688.5 83.9 386.9 1373 5616 20685 999.3 1253.9 8043.7 30174 179990 1332248 68206 nal CA [mm ] 0.210 0.544 1.13 2.19 3.97 36.48 0.223 0.565 1.20 2.31 4.22 38.51 0.212 0.573 1.240 2.386 4.309 39.06 slack space [%] 5.35 3.52 2.69 4.13 5.89 2.85 10.7 7.27 8.64 8.82 11.37 7.96 6.54 8.45 11.35 11.84 13.22 9.26 3.195 9.636 18.759 44.953 118.435 713.98 3.567 10.141 20.264 50.565 129.154 661.87 5.248 15.960 39.211 99.641 264.120 796.18 4.892 14.440 32.872 88.242 234.826 745.75 MBB nal WL [mm] SPBH I


MBB

20 40 80 160 320 49 (ami49)


SPBH I

20 40 80 160 320 49 (ami49)

(avoiding obstacles). To substantiate this statement, Figure 7.20 shows the ratio of the total wire length obtained by SPBH I and MBB, as a function of the number of blocks (for three independent runs per value of ). It is easy to see that the correlation between SPBH I
correlation between SPBH _I and MBB global routing 2.3

2.2

ratio of total wirelength SPBH_I and MBB

2.1

1.9

1.8

1.7

1.6

run 1 run 2 run 3 0 50 100 150

1.5

200

250

300

350

Figure 7.20: Ratio of total wire length computed by SPBH I and MBB, as a function of (size of randomly generated benchmark). and MBB is heavily problem-size dependent. In general we may conclude that MBB routing is a bad total wire length predictor. Even stronger, MBB routing can signicantly decrease placement quality. Furthermore, for the randomly generated problem instances, a clear trend towards a xed ratio can be observed as grows larger. Despite the existence of a strong

7.10 Concluding Remarks

169

correlation between the wire-length estimation obtained by SPBH I and MBB, it is merely a statistical measure which clearly does not guarantee that a decrease in SPBH I solution always corresponds to a decrease in MBB solution. Therefore, the practical usefulness of MBB routing is highly arguable.

7.9.3 Conclusions
Summarizing, we can conclude the following.

Our implementation of the simulated annealing optimization algorithm produces excellent packings, cf. [73, 69, 47]. Note that we did not optimize for speed, for instance by applying faster sequence pair algorithms [81, 47]. Coarse MBB routing does not correlate well with more accurate routing schemes such as SPBH I routing. Therefore it is not wise to apply MBB routing as a standard routing method to evaluate the quality of a routing-aware placement tool15 . However, we do observe quite a strong correlation between SPBH I and MBB among several runs of the same problem instance. In other words, a xed ratio can be computed but unfortunately this will give a distorted notion because it is not guaranteed to hold for every nal solution. The accuracy of global routing estimation signicantly impacts the quality of a block placement, since a substantial decrease of about 6.3% to 16.2% in wire length can be observed for SPBH I-based optimization while the chip area increases at most 3.3%. Accurate global routing, as compared to MBB routing, incurs a large penalty on the run-time performance of the optimization framework. The main culprits for this are the explicit construction of a global routing graph which has to be updated for each net, and the complexity of the accurate global routing algorithm itself.

Although the proposed optimization framework works well on pure block placement instances, it does not necessarily imply good behavior when additional constraints such as wire length are introduced. However, it is unlikely that MBB-based routing would render the optimization convergence behavior radically different from SPBH I-based routing. It should also be noted that for problem instances in which blocks have a high amount of pins, the routing complexity starts dominating the behavior of the optimization tool. As a consequence, for accurate routing to be practical for large problem instances, the routing complexity should be minimized substantially. This could, for instance, be accomplished by employing incremental techniques in conjunction with thorough optimization of the source code.

7.10 Concluding Remarks


In this chapter we gave the main ingredients for efcient obstacle-avoiding global routing methods, consisting of a global routing model and a global routing algorithm. We showed which points are important to consider, and might thus be eligible for future improvements.
15 We observed that the ratio SPBH I/MBB tends towards a value around 2.2 when randomly generated benchmarks.

for our set of

170

Routing

A very important requirement for enabling incremental computation is that all data structures are fully dynamic. Of course, much effort is needed to implement these concepts properly. It is even more difcult to implement these ideas with high run-time performance in mind. Since practical run times of in-loop operations are very important in a iterationintensive environment such as simulated annealing, optimizing the implementation should be considered, too.

Chapter 8

Dealing with Physical Phenomena: Parasitics, Crosstalk and Process Variations


The performance of high-frequency mixed-signal and analog designs heavily relies on the actual layout of the circuit components on device level. Therefore, proper placement and routing are of utmost importance. However, sole conventional constraints on placement and routing such as minimal area and minimal wire length, respectively, are not sufcient anymore. Taking into account previously neglected second order effects has been acknowledged to be a necessity. This chapter deals with the most important phenomena that can be handled with proper placement and routing. Roughly stated these phenomema can be classied into

self-parasitics, crosstalk phenomena, and process variations.

The aforementioned phenomena are discussed in detail and their role in the context of mixedsignal layout generation is made clear. In order to minimize the detrimental effects of these phenomena, accurate models are required. However, due to the iterative nature of our stochastic optimization engine, the models must have low associated computational complexity. We observe that in general very little effort has been dedicated to performance-driven optimization of layout in a pre-detailed-routing phase. We claim that performance issues should be taken into account as soon as possible in the optimization phase, preferably during placement and global routing, in order to obtain high-quality layouts. This claim is clearly supported by the approach taken in this thesis. Substrate coupling is a crosstalk phenomenon which has not been considered much in the context of layout generation. Therefore, investigations are performed to gain more insight on this topic in connection with integrated placement and routing. A novel method is proposed which takes substrate coupling into account without increasing computational complexity. Experimental results demonstrate the practical feasibility of the method. Furthermore, we show that the approach can be easily mapped into an incremental framework.

172

Dealing with Physical Phenomena: Parasitics, Crosstalk and Process Variations

8.1 Previous Work


The amount of prior art with respect to crosstalk-aware, parasitics-aware, and process-variationsaware layout generation is rather limited. Most related works are either focused on efcient modeling of delay and crosstalk phenomena [126], parasitics-ware detailed routing [8], or post-placement yield improvement by enhanced routing techniques [123, 8]. Typically, process variations are taken into account by matching techniques in the context of analog layout [6, 8]. The weakness of this approach is that the used design rules reduce the actual problems to human-manageable notions which do not adequately take into account all spatial and electrical considerations. Consequently, much room for improvement lies here, from an algorithmic point of view. To the best of the authors knowledge, no previous work exists which handles or attempts to handle any of the above phenomena at the pre-detailed-routing level. An exception should be made here with respect to process variations. Several researchers have attempted to use matching rules [6, 8], which are established design rules in analog layout, in an automated environment to reduce the adverse affects of process variations on circuit blocks which should resemble each other in every respect as close as possible. However, the improvement expressed in quantitative measures as a result of this matching has barely been assessed in published works. What renders this issue even more complicated is the (lack of knowledge on the) quality and impact of routing in this context. It is interesting to note that in state-of-theart mixed-signal designs such a current-steering digital-to-analog converters, the matching problem due to process variations is very dominant [12], and thus needs to be dealt with. Recent work by Doris et al. [127] gave a fundamental theoretical basis to this problem and the same researchers proposed an effective method to mitigate the inuence of process variations in high-performance D/A converters.

8.2 Efciency and Accuracy Requirements


Efcient and accurate modeling of signicant performance-degrading phenomena is important for succesful optimization of mixed-signal layout for the following reasons:

A very sophisticated model adds too much overhead to the overall computational complexity of the optimization framework, rendering the approach impractical. A coarse inaccurate model can negatively impact performance of the optimization engine and cause convergence problems in the worst case.

A trade-off between accuracy and efciency is in general unavoidable, but it is important to keep in mind that the model should produce a consistent estimation of reality. In other words, it is better to have a reasonably constant over-estimation of 30% than an apparently more accurate but uctuating estimation accuracy between -10% and 10%.

8.3 Self-Parasitics
Inherent physical properties of the materials which form a layout, induce parasitic phenomena which are not adequately modeled by many automated layout generation systems. The selfparasitics which consist of resistance, capacitance and inductance of a wire, form a separate

8.3 Self-Parasitics

173

class of unwanted effects based on the observation that the value of a self-parasitic solely depends on the geometrical properties of a single wire, independent of neighboring objects.

8.3.1 Wire Resistance, Capacitance and Inductance


The most simple wiring scenario consists of a single wire routed in a single-layer. Although this is not always possible in reality, this situation is the basis of more elaborate wiring scenarios. Figure 8.1 illustrates the sources of self-parasitics of a piece of interconnect. The area


Figure 8.1: The sources of self-parasitics for a piece of interconnect. capacitance depends on the thickness , width , height and length of the piece of wire. Furthermore, a so-called fringing capacitance exists which depends on the same parameters but with a different weighting. Of course the actual type of material of which the piece of interconnect is made plays a role, too. Besides the capacitance, there is also a series inductance and a series resistance associated with every piece of interconnect. Depending on the type of signal carried through the wire, the material and geometry of the wire, either one of them might be dominant over the other. To give the reader an impression fF/ m , of some typical values of parasitic elements for a 0.5 m CMOS process: fF/ m ( m, m, m) for a metal1-metal2 scenario, and m for metal1, and m for metal2. Normally, higher metal layers have smaller sheet resistances.

8.3.2 Via Resistance and Area


When completion of a net in a single layer is not possible, for instance due to congestion problems, we are forced to use another routing layer. A so-called via is used to locally connect two pieces of interconnect from different layers. Typically these layers must be adjacent. Going from one layer to another layer cannot occur without a penalty. The penalty is the relatively high cost of a via in terms of series resistance and area. For example, in a 0.5 m CMOS process, a via has a typical series resistance of 0.5 with a maximum of 2.5 . This is equivalent to a 1 m wide metal2 wire of length 9 m. Moreover, typically the use of vias goes in pairs which means that the equivalent amount of additional wiring is equal to 18 m for every wiring bridge. Furthermore, yield generally decreases with increasing number of vias. Therefore, avoiding vias as much as possible is an important layout design rule.

174

Dealing with Physical Phenomena: Parasitics, Crosstalk and Process Variations

8.4 Crosstalk
Crosstalk is the net effect of undesired signal propagation via parasitic coupling between objects in the layout. An effective remedy to lessen crosstalk is spatial separation of the objects that are subject to crosstalk. However, this is not a trivial problem to solve in most practical cases. In this thesis we discuss two different types of crosstalk sources: crosstalk due to substrate coupling and crosstalk due to parasitic coupling capacitance. A third source of crosstalk is magnetic coupling modeled with parasitic mutual inductance. This last issue lies outside of the scope of this thesis. However, we note that magnetic coupling effects can also be incorporated into our framework with some effort. By improving our understanding of the mechanisms of crosstalk and their effect on performance, we can nd better means to reduce detrimental crosstalk effects in the proposed integrated placement and routing framework.

8.4.1 Substrate Coupling


The layout of an integrated circuit is embedded into a piece of silicon material. Ideally, this carrier, which is better known as the substrate, should not inuence the operation of the IC. Unfortunately, due to non-ideal properties of the substrate, i.e. nite resistance, it is a conductive layer which propagates signals via parasitic coupling to and from many points in the circuit. The layout objects which have the lowest impedance to the substrate, are also most severely affected by the signal(s) carried by the substrate. In principle, a MOS transistor can be coupled as strong to the substrate as a piece of interconnect, but this strongly depends on the actual geometry of the objects. Moreover, in case of an NMOS transistor which lies in a P-type substrate, the voltage across the NP-junction determines the effective coupling capacitance to the substrate. Also, the backgate capacitance of a MOS transistor is a strong source of substrate coupling. We distinct two different types of substrates: a high-resistivity substrate and a lowresistitivy substrate. Figure 8.2 shows how the semiconductor materials in these substrate types are typically doped and layered. The high-resistivity substrate is composed of a lightly doped bulk region which is about 200 to 400 m thick and a thin epi-layer which has a lower resistivity. The low-resistivity substrate consists of a lightly doped p-type epi-layer grown on a heavily doped p-type bulk. The bulk is typically 100 to 400 m thick, and the epi-layer thickness varies from 5 to 15 m. One of the many advantages of the use of high-resistivity substrate is that these type of substrates preserve conventional circuit design techniques, whereas the low-resistivity substrates require signicantly different circuit design techniques because the substrate can be viewed as a single super-node [128]. Another reason to choose for a high-resistivity substrate is that this type of substrate allows for layout manipulation to decrease the adverse effect of substrate coupling on circuit performance, while layout techniques are of less use for substrate coupling reduction in connection with low-resistivity substrates. Also, highresistivity substrates allow for the creation of better on-chip passive components. However, a drawback of high-resistivity substrates is the latch-up phenomenon, introducing unwanted parasitic transistors in the circuit with all their consequences; low-resistivity substrates virtually do not suffer from latch-up. Fortunately, with the advent of higher-frequency circuits and lower supply voltages, the latch-up phenomenon becomes increasingly less a problem [129].

8.4 Crosstalk

175

1 m

p-type

0.1 cm 10 m p-type

10-15 cm

400 m

p- bulk

7-15 cm

400 m

p+ bulk

10 mcm

(a)

(b)

Figure 8.2: Two fundamentally different type of substrates: (a) a high-resistivity substrate, and (b) a low resistivity-substrate. A simple model for a high-ohmic substrate, which was semi-empircally determined by Joardar [130], is shown in Figure 8.3. This model ts perfectly in our stochastic optimization framework, since it can be evaluated quickly. It should be noted that this model holds for guarded modules, but application to unguarded modules is justied [131] if we only want to minimize the inuence of substrate coupling. Nodes A and B in the circuit are connected node A

substrate

node B

Figure 8.3: A simple substrate model. to specic points in the integrated circuit, for instance the drains of two separate MOS transistors. In case of a bulk contact, the capacitor should be replaced by a short circuit. The resistances , , and strongly depend on process parameters and the geometry of layout modules. This information can be easily stored in a parameterized manner, since the layout module shapes are known in advance and very regular. Resistance is of most interest to us since it depends on the actual module placement. A closed-form expression for is (8.1) where is the effective lateral coupling length between two coupled objects, is the spacing between these objects, and , , are constants for a given process. The form of the

176

Dealing with Physical Phenomena: Parasitics, Crosstalk and Process Variations

equation used to model is physically based, and obtained by solving the Laplace equations for two circular substrate contacts [131]. The slightly complicated form is because three-dimensional effects are included. Furthermore, since no simple expression exists for rectangular geometries, the one available for circuit contacts was used as an approximation. In the remainder of this chapter resistances and junction capacitances and will be ignored for simplicity, but without loss of generality.

8.4.2 Parasitic Coupling Capacitance


In practice, the interconnect of a layout consists of many adjacent wires within the same layer or on distinct layers. In the situation where the adjacent wires lie on the same layer, we speak of line-to-line capacitance or lateral capacitance. The other cases are covered by the the area capacitance and fringing capacitance as shown in Figure 8.1. To complete the picture, a simplied scenario is drawn in Figure 8.4. It is clear that the lateral capacitance

object 5

layer 3

object 2

object 3

object 4

layer 2

object 1

layer 1

Figure 8.4: A simplied scenario which shows all parasitic capacitances from a piece of conducting material to its environment. depends on the distance between the objects on the same layer and the longest common length of the parallel-running parts of the lateral objects. More information on the values of these capacitances can be found in specic technology les. It is also possible to derive reasonably accurate closed-form expressions for many important parasitic phenomena in connection with wiring [126].

8.5 Process Variations


Process variations consist of systematic and random errors which occur due to non-idealities of IC manufacturing equipment. To name a few examples: non-uniform layer thickness or doping across the wafer, under-etching and over-etching, mask mis-alignments, etc. The impact of these errors on circuit performance can be tremendous. For instance, differences in threshold voltages of switching transistors can easily lead to signal skew. This, in

8.6 Incorporating Crosstalk and Parasitics into Routing

177

turn, can for instance lead to a reduced spurious-free dynamic range in the case of digital-toanalog converters. It is well-known that at least the systematic errors are strongly correlated with the location of the geometrical objects in a layout [80]. Therefore, it is important to take these effects into account in the layout phase so that the detrimental effects of process variations can be reduced as much as possible. Proper matching of the transistors of a differential pair is a well-known issue in analog layout design. In general we can speak of matched circuit or layout modules in which a module can consist of a single transistor, but can also be a passive element, or even a small subcircuit. Furthermore, it is important to note that matching constraints are essentially equivalent to relative placement constraints. These type of constraints can be taken into account by an efcient placement representation such as the sequence pair. However, the proposed approach by Balasa and Lampaert [45] is not efcient in terms of computational complexity of a single constrained placement evaluation. Furthermore, their approach induces a more irregular cost landscape and, therefore, worse convergence of the simulated annealing optimization algorithm. An approach based on the constrained placement work of Tang and Wong [47] is likely to offer better results. Finally, we note that although virtually all analog layout matching efforts have been focused on symmetric placement, at least as important in this context is symmetric routing. The latter has, to the best of our knowledge, never been explored in depth in the context of layout generation. The term symmetric is in our notion not restricted to geometric symmetry. Albeit sufcient, we argue that it is not necessary, since symmetric signals through the matched interconnect is the ultimate goal. In this thesis we do not elaborate further on the issue of process variations in connection with mixed-signal layout generation.

8.6 Incorporating Crosstalk and Parasitics into Routing


The importance of crosstalk-aware and parasitics-aware routing is generally acknowledged by both industry and academia. However, the incorporation of these effects is mostly performed at the detailed routing level. Only a limited number of works have considered these effects at the global routing level [132, 133]. To the best of the authors knowledge no works have considered these effects at the pre-detailed routing level in an integrated placement and routing stochastic optimization framework. Since expensive ripup-and-reroute strategies are to be avoided, it is better to estimate the amount of crosstalk beforehand as good as possible, and eventually perform detailed routing based on more exible global routing information. Although the issue of incorporating crosstalk and parasitics into routing is of utmost importance for efcient detailed routing, it could not be handled within the scope of the present research work.

8.7 Incorporating Substrate Coupling into Placement


In this section we investigate the incorporation of substrate coupling into the placement phase of modules. In principle, substrate coupling also occurs for wiring but this aspect is not covered in this thesis. In order to estimate the amount of substrate coupling, the simple substrate model of Figure 8.3 is used. For the calculation of resistance in this model, exact information on the geometry and location of each module connected to node A and B is

178

Dealing with Physical Phenomena: Parasitics, Crosstalk and Process Variations

needed. Therefore, we have to dene exactly how a module is composed. The essentially 1-dimensional model of Joardar in the form of (8.1) is then generalized to two dimensions. In the context of sequence-pair-based block placement, we propose a novel method to handle the slack space that exists in most placements such that the impact of substrate coupling is minimized. The algorithm for accomplishing this is based on expansion of the core module and shifting the core module in the expanded module space so that the total impact of substrate coupling is minimized. More explicitly, in terms of the simple substrate coupling model of Figure 8.3, the task is to reduce to minimize the coupling, of which the actual impact is evaluated by means of a priori obtained sensitivity values. Note that the overall optimization procedure does not imply that placements with a large amount of slack space are seen as bad a priori. On the contrary, introducing additional slack space might reduce the overall impact of substrate coupling. With a properly chosen balance between chip area, total wire length, and substrate coupling impact, the simulated annealing algorithm stochastically searches for a placement which adheres to the given cost function (see (4.3)). Experimental results show the effectiveness and efciency of the approach.

8.7.1 A Basic Module


The atomic elements that can be manipulated during optimization are rectangular modules. A module has connecting pins on all four sides, which means that modules are adequate to represent devices such as transistors, capacitors, inductors and resistors. The space occupied by a rectangular module can be subdivided into: core module space, routing space, and expansion space. Figure 8.5 depicts the basic module and the necessary data structure elements. The data-structure of the basic module we use consists of:

expansion space routing space

core module

)
enclosing module

Figure 8.5: A basic module.

: the lower left coordinate of the enclosing module, : the width and height of the enclosing module, : and offset of the core modules lower left corner,

8.7 Incorporating Substrate Coupling into Placement

179

: width and height of the core module, : top, right, bottom, and left routing space width.
If and then we call the enclosing module tight, otherwise the enclosing module is loose and we have expansion space. Hereafter, routing issues are disregarded (at least the details); we included them here for completeness.

8.7.2 Generalized 2-Dimensional Substrate Coupling Model


Due to the fact that two placed modules can be placed in many ways relative to each other, a simple 1-dimensional distance measure to estimate the coupling resistance is clearly not sufcient. Therefore, the value of resistance in the substrate model of Figure 8.3 should be made dependent on the amount of skew between two modules and and the distance between the two modules. This idea is visualized in Figure 8.6. We propose a simple geoskew

distance

Figure 8.6: Denition of skew and distance between two placed modules and . metrical method to compute the effective coupling in cases where modules are skewed. In addition, it is made plausible that the original expression for should be modied slightly in order to incorporate the proximity effects of modules with different dimensions. These notions are shown in Figure 8.7. It is clear that the renement of substrate coupling resistance in a 2-dimensional setting is based on geometrical arguments. For the case shown in Figure 8.7(a), two additional terms should be added to the computation of , giving it the following general shape when two modules are not fully skewed (Figures 8.7(a) and (b)):

asin

asin

(8.2)

where is an additional constant which is used for tting. In the case of a fully skewed placement of two modules and , the lateral parallel coupling between and in the original sense of (8.1) has vanished. Instead, the following formula which is derived from Figure 8.7(c) holds:

asin
.

(8.3)

where and are treated similarly to and

180

Dealing with Physical Phenomena: Parasitics, Crosstalk and Process Variations

(a) A modication of the original expression for coupling resistance takes into account the fringing effect.

(b) When modules are partially skewed, fringing effects arise from two adjacent sides of the modules.

(c) In the case where modules are fully skewed, we only have fringing effects and no lateral parallel effect.

Figure 8.7: The rened substrate coupling scenerios which take into account the fringing effects for proper estimation of the substrate coupling resistance in a 2-dimensional setting. It is a straightforward task to express (8.2) and (8.3) in terms of the geometrical and spatial properties of the basic modules. Finally, the amount of substrate coupling between . modules and is inversely proportional to resistance and denoted by

8.7.3 Substrate Coupling Impact Minimization


A high level of substrate coupling does not necessarily mean that circuit performance is badly affected. To map the amount of substrate coupling to circuit performance, circuit sensitivities are required (see Chapter 2). Let us assume that these sensititivies are known, then this brings

8.7 Incorporating Substrate Coupling into Placement

181

us to the ultimate goal of minimizing the impact of substrate coupling. The impact (on a given performance measure) of substrate coupling from module to module is dened by1

(8.4)

where is the substrate coupling sensitivity of performance function dened on module . The sensitivities can, for instance, be obtained from circuit simulation, a priori. The noisiness of module depends on both amplitude and time-derivative of a predened electrical property. Note that module is assumed to be xed in location, while the optimal position of module is to be determined. The problem of minimizing the impact of substrate coupling can be stated as follows. Problem: Substrate Coupling Impact Minimization Problem Instance: Solutions: A placement of modules associated with a sequence pair with chip area dimensions . All possible non-overlapping absolute placements of the modules without violating the relative relationships dictated by sequence pair . , subject to

Minimize:

Note that the problem can be seen as a force-balanced constrained mechanical system with given initial conditions, but strongly nonlinear relationships between the components. Clearly it is too costly to solve this problem to optimality in the context of our stochastic optimization framework. Therefore, we simplify the problem in three respects. 1. We introduce a rectangular window around every module which limits the number of surrounding modules that affect the module to be shifted to an optimal location. This is a reasonable limitation since in practice the modules that lie further away will be shielded by closer modules. 2. We accept a sub-optimal solution due to the procedure of selecting the modules to be processed sequentially in the order of decreasing amount of expansion space. Also this is an acceptable limitation since subsequent modules can never be shifted more than the maximum allowable thus lessening the effect on previously shifted modules. 3. Within the constrained minimum- location problem for a single module , any locally optimal solution is accepted. The underlying reason is that nding the global solution of this nonlinear function minimization problem of two variables is computationally expensive, while any additional gain might not be much.
1 Note

that

, in general, but

182

Dealing with Physical Phenomena: Parasitics, Crosstalk and Process Variations

8.7.4 An Efcient Substrate Coupling Impact Minimization Algorithm


In order to tackle the substrate coupling impact minimization problem, it is necessary to nd the surrounding modules of a given module efciently. For that purpose we use the corner stitching data structure which enables efcient module enumeration. As discussed in Chapter 5 nding the neighbors of a given module can be performed in , where is typically close to the selected area (range window) over total chip area ratio times the total number of modules . For each set of enumerated neighboring modules, a function minimization problem has to be solved in a pre-dened order. Since we settle with a local minimum, an efcient off-the-shelf algorithm can be used for this [134, 135]. We consider the one-dimensional case of the problem for simplicity and ease of implementation, but without loss of generality with respect to the simplied version of the problem as described before. The overall algorithm is based on the ideas of module enumeration and expansion which have been explained in Chapter 6, Section 6.10. The most important implication of the proposed packing-to-sequence-pair algorithm is that the algorithm enumerates and expands all modules in linear computational complexity. With this information we can devise the algorithm given in Figure 8.8 to (locally) minimize the impact of substrate coupling. Clearly, the computational complexity of step 1 is
1. Enumerate and expand all modules and put the expanded modules in sorted order in a data structure, the module with smallest expansion space last. .

2. Extract the module with largest expansion space from the data structure and enumerate all neighbors of that fall within the range window. Solve the (two-dimensional) function minimization problem. Let the solution be . Set the absolute position of module to .

3. If

, then stop, otherwise

and go to step 2.

Figure 8.8: The substrate coupling impact minimization algorithm. the sum of enumerating and expanding all modules, and putting them in sorted order in a data structure. This can be performed in , on average. Step 2 consists of extracting an unprocessed module with largest expansion space from the data structure and enumerating the neighbors of that module. Since the modules are stored in sorted order, the extraction step takes . The computational effort to enumerate the neighbors of a module depends on the size of the range window and is given by (5.4). Furthermore, nding a (local) minimum of substrate coupling impact is roughly proportional to the number of terms in the function to be minimized. This, in turn is proportional to the number of neighboring modules. Consequently, the total computational complexity is dominated by the latter. Since step 2 is repeated exactly times, the overall computational complexity of the algorithm is equal to . When is not dependent on , this expression equals . Compared with a from scratch computation of a packing, we may conclude that no additional computational overhead is induced by the substrate coupling impact minimization problem.

8.7.5 Implementation Considerations


In an actual implementation, sorting of the modules with respect to their expansion space size must be handled carefully. Since a straightforward sorting procedure for elements easily

8.7 Incorporating Substrate Coupling into Placement

183

Table 8.1: Practical CPU times of one SA iteration with the proposed substrate coupling impact minimization (SCIM) method (second column), and the SCIM time alone (third column).

total time [s] 0.871 2.012 3.792 6.240 9.873 14.221 21.380

SCIM time [s] 0.075 0.101 0.133 0.156 0.178 0.206 0.250

200 300 400 500 600 700 800

incurs complexity it is better to choose a sorting algorithm such as bucket sort. The use of bucket sort is justied, since we may assume that the input distribution of values is semi-random.

8.7.6 Experimental Results


The substrate coupling impact minimization problem has been implemented in C and evaluated on a Linux operating system running on a Pentium MMX 200MHz CPU with 64Mbytes of RAM. We will show how much the practical run-time of a simulated annealing optimization iteration increases, after incorporating the expansion and substrate coupling impact minimization algorithm. We also show that practical run-times of the proposed method are linear in the size of the problem instance. The simulations are performed on a batch of randomly generated problem instances. The results are summarized in Table 8.1. Moreover, Figure 8.9 shows graphically the relations between the CPU times and the problem instance sizes . Figure 8.9(a) shows a super25 total CPU time (s) CPU time (s) 300 400 500 600 # modules 700 800 20 15 10 5 0 200 0.26 SCIM 0.24 0.22 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 200 300 400 500 600 700 800 # modules

(a)

(b)

Figure 8.9: Plots of the CPU time of one SA iteration as a function of the problem instance size : a) total time and b) SCIM time only. linear relationship between the CPU time and the number of modules for one complete SA iteration, whereas Figure 8.9(b) shows a linear relationship for the substrate coupling impact minimization algorithm, as expected. Note that the packing computation algorithm is the algorithm originally proposed by Murata et. al. [48]. For visual satisfaction, graphical representations of the standard packing results and the expanded and optimized packing results for a set of ten modules are shown in Figure 8.10.

184

Dealing with Physical Phenomena: Parasitics, Crosstalk and Process Variations

The optimization is performed using randomly generated substrate coupling sensitivities

4 8 5 7 0 9 0 6 1

4 8 5 7 9 6 1

4 8 5 7 0 9 6 1

(a) Standard packing.

(b) Packing with corner stitching.

(c) Optimized packing.

Figure 8.10: The

corresponding to SP a) without and b) with explicit representation of empty space using corner stitching, and (c) optimized with respect to substrate coupling impact for given sensitivity values.

packing

of

10

modules

and noisiness values .

8.7.7 Conclusions
We presented a new and efcient substrate coupling impact minimization (SCIM) algorithm, that enables efcient incorporation of substrate problems into an iterative placement optimization loop. Substrate coupling has been recognized as one of the major physical design bottlenecks for high-performance high-frequency mixed-signal circuits. Therefore, minimizing the impact of substrate coupling will result in better designs in less design iterations. Results of simulations performed on randomly generated medium to large problem instances, clearly show that the practical run-time of the SCIM algorithm is linear in the problem instance size, which is optimal in the context of a from scratch computation of a packing. It should be noted that in order to incorporate the inuence of the coupling capacitances, more iterations are needed.

8.8 Incremental Substrate Coupling Impact Minimization


In the same line as incremental placement and incremental routing, we propose to put substrate coupling impact minimization in the context of incremental computation. Indeed, it is almost trivial to compute the impact of substrate coupling in an incremental fashion. Once it is known which modules have changed position due to a perturbation of the simulated annealing algorithm, we only have to re-arrange that set of modules. Essentially, this comes down

8.9 Concluding Remarks

185

to applying the SCIM algorithm given in Figure 8.8 to the restricted set of moved modules. It is easy to see that the overall computational complexity of the incremental algorithm is proportional to the number of moved modules. Note that the latter holds under the assumption that enumerating these moved modules can be performed efciently, i.e. proportional to the number of moved modules.

8.9 Concluding Remarks


We have given an overview of some very important physical phenomena that must be taken into account for the purpose of generating a high-quality mixed-signal layout. These phenomena are: self-parasitics, crosstalk and process variations. A specic type of crosstalk due to substrate coupling is considered in depth, and a novel approach to minimize the negative impact of substrate coupling is proposed. Experimental results show that the method is practically feasible, although the examples are not taken from real-life circuits. In the same line of thought as for placement and routing, the additional constraints induced by physical phenomena can also be taken into account in an efcient incremental manner.

186

Dealing with Physical Phenomena: Parasitics, Crosstalk and Process Variations

Chapter 9

Conclusions and Directions for Future Research


In this chapter we briey summarize our conclusions based on the overall results which have been elaborated in preceding chapters. Moreover, due to time restrictions imposed on this research work, many interesting aspects which emerged during promising and less fruitful attempts to solve a heterogeneous mixture of theoretical and practical problems, were put aside. A brief overview is given of issues which are important from our standpoint. Hence, these matters should be investigated further in order to improve our insight into the problem and the solution method, and consequently improve mixed-signal layout generation.

9.1 Conclusions

The implementation details of simulated annealing have a tremendous impact on the performance of the algorithm in practice. This is an issue which is mostly left undiscussed in virtually all papers which employ simulated annealing for global optimization. We have shown that knowledge of efcient algorithms and advanced data structures is of utmost importance in the context of designing (new) efcient algorithms in the context of mixed-signal layout generation. We have proposed and implemented an efcient incremental framework for computing accurate block placements under the constraint of several user-denable parameters. The efciency of the incremental approach is backed up by concise theoretical arguments. The average computational complexity for a single incremental computation, being , is better than any previously reported result. A new consistent (idempotent) linear-time placement-to-sequence-pair mapping algorithm is proposed. The algorithm is useful, for example, in the context of converting graphical user-interface data to an abstract format. An improved, more robust, and easy to implement constrained block placement algorithm has been proposed which improves signicantly over previous results. However, the naive implementation which leaves room for improvement, is slower than the original tuned algorithm.

188

Conclusions and Directions for Future Research

A new method for constructing an efcient global routing graph from a placement of modules has been proposed. The method has average computational complexity , where is the number of placed modules. Under some reasonably weak conditions this complexity can be reduced to . An important feature of the new construction is the fact that dynamic changes in the graph are supported and can be performed efciently. We have devised new efcient global routing algorithms for nding obstacle-avoiding routes of multi-pin nets in the proposed global routing graph. These heuristics have been extensively benchmarked for a large set of routing problem instances derived from sequence-pair placements. The heuristic results are compared with optimal results which have been obtained using state-of-the-art third party tools. The fact that not all problem instances were solvable to optimality demonstrates the difculty of the problem instances (and the routing problem). A set of tests have been performed with the integrated accurate sequence-pair placement representation and accurate obstacle-avoiding global routing heuristic in the simulated annealing optimization loop. The outcome of our experiments demonstrates unambiguously that the current de-facto standard minimal-bounding-box routing method does not qualify for nding good placements while minimizing actual global routing length. Substrate coupling can be taken into account efciently and in an incremental manner using a linear complexity algorithm. Using pre-computed sensitivity values, we show that the impact of substrate coupling can easily be (locally) minimized.

9.2 Directions for Future Research


In connection with routing, it is expected that the use of multiple high-quality global routing solutions for a given net, as opposed to using only the best found solution, will improve the overall quality of the global routing result. This approach essentially tackles the net ordering problem, which is a very hard problem. Using multiple high-quality solutions for a single net will not mitigate the problem. On the contrary, this version of the multi-net routing problem can be proven to be NP-hard under the constraint of uniform wire spreading, by showing that it is in essence a max sub-set sum problem which is known to be NP-hard [34]. Furthermore, generally we would like to have all wires uniformly distributed over the chip area. Several reasons can be brought forward in this respect. To name a few:

Uniform wire distribution implies that modules are expanded evenly. This in turn means that the quality of an interconnect will not suffer too much from the change in module positions. Although the quality of an interconnect might degrade due to longer length, compared to an optimal-length interconnect in the expanded scenario, the relative quality should be quite insensitive to a uniform expansion operation. From a manufacturability/yield point of view, it can be advantageous to spread powerdissipating wires over a larger area so that the temperature is more evenly distributed over the chip, plus the occurrence of so-called hot spots that can eventually cause performance degradation over time might be prevented.

9.2 Directions for Future Research

189

The detrimental effect of parasitic coupling can be somewhat lessened by proper wire spreading. At least the impact of wire coupling can be assessed and handled more easily when the number of wires in a single region is lessened.

In order to take the step to detailed routing, we need to make sure that enough routing space is reserved. Reserving enough space can be seen as a module expansion problem: how much do we need to expand each module? This, in turn, depends on which global route segments are assigned to which module, mapping the expansion problem into an assignment problem. The latter is an important problem which needs to be investigated in detail. Last but not least, we note that temperature issues are also important to consider in the context of dealing with other physical constraints, since temperature gradients can be considered to be as bad as process variations, especially in connection with matched circuit components. However, to perform temperature analysis in an accurate way, we need to estimate power accurately. It is well-known that the latter is a non-trivial problem which is an active eld of research. Efcient models to estimate power can help us in quickly determining (dominant) temperature proles, which in turn can be used in an iterative optimization framework. Much research still needs to be performed in this respect.

190

Conclusions and Directions for Future Research

Bibliography
[1] Semiconductor Industry Association, National Technology Roadmap for Semiconductors, 1997. [2] E. Malavasi, Techniques for Performance-Driven Layout of Analog Integrated Circuits, M.S. thesis, University of California, Berkeley, 1993. [3] E. Charbon, Constraint-Driven Analysis and Synthesis of High-Performance Analog IC Layout, Ph.D. thesis, University of California, Berkeley, 1995. [4] H. Chang, A Top-Down, Constraint-Driven Design Methodology for Analog Integrated Circuits, Ph.D. thesis, University of California, Berkeley, 1994. [5] E.S. Ochotta, R.A. Rutenbar, and L.R. Carley, Synthesis of High-Performance Analog Circuits in ASTRX/OBLX, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 15, no. 3, pp. 273294, March 1996. [6] J.M. Cohn, D.J. Garrod, R.A. Rutenbar, and L.R. Carley, KOAN/ANAGRAM II: New Tools for Device-Level Analog Placement and Routing, IEEE Journal of SolidState Circuits, vol. 26, no. 3, pp. 330342, March 1991. [7] J.M. Cohn, D.J. Garrod, R.A. Rutenbar, and L.R. Carley, Analog Device-Level Layout Automation, The Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishers, 1994. [8] K. Lampaert, Analog Layout Generation for Performance and Manufacturability, Ph.D. thesis, Katholieke Universiteit Leuven, 1998. [9] C. Lin, D.M.W. Leenaerts, and A.H.M. van Roermund, Faster Incremental VLSI Placement Optimization, in Proc. 15 European Conference on Circuit Theory and Design, 2001, vol. II, pp. 153156. [10] C. Lin and D.M.W. Leenaerts, A New Efcient Method for Substrate-Aware DeviceLevel Placement, in Proc. ASP-DAC 2000, January 2000, pp. 533536. [11] B.R. Stanisic, N.K. Verghese, R.A. Rutenbar, L.R. Carley, and D.J. Allstot, Addressing Substrate Coupling in Mixed-Mode ICs: Simulation and Power Distribution Synthesis, IEEE Journal of Solid-State Circuits, vol. 29, pp. 226238, 1994. [12] G.A.M. van der Plas, J. Vandenbussche, W. Sansen, M.S.J. Steyaert, and G.G.E. Gielen, A 14-bit intrinsic accuracy Q random walk CMOS DAC, IEEE Journal of Solid-State Circuits, vol. 34, no. 12, pp. 17081718, December 1999.

192

Bibliography

[13] T. Koch, A. Martin, and S. Vo, SteinLib: An Updated Library on Steiner Tree Problems in Graphs, Tech. Rep. ZIB-Report 00-37, Konrad-Zuse-Zentrum f r Inforu mationstechnik Berlin, http://elib.zib.de/steinlib, 2000. [14] T. Lengauer, Combinatorial algorithms for integrated circuit layout, Wiley, Chichester, 1990. [15] N.A. Sherwani, Algorithms for VLSI physical design automation, Kluwer Academic, 1993. [16] R.J. Baker, H.W. Li, and D.E. Boyce, CMOS Circuit Design, Layout and Simulation, IEEE Press Series on Microelectronic Systems. IEEE Press, 1998. [17] E. Malavasi and E. Charbon, Constraint transformation for IC physical design, IEEE Transactions on Semiconductor Manufacturing, vol. 12, no. 4, pp. 386395, 1999. [18] G. Jusuf, P.R. Gray, and A.L. Sangiovanni-Vincentelli, CADICS - Cyclic Analog-toDigital Converter Synthesis, in Proc. IEEE International Conference on Computer Aided Design, November 1990, pp. 286289. [19] F.K. Hwang, D.S. Richards, and P. Winter, The Steiner Tree Problem, vol. 53 of Annals of Discrete Mathematics, North-Holland, Amsterdam, 1992. [20] A.B. Kahng and G. Robins, On Optimal Interconnections for VLSI, The Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishers, 1995. [21] M.R. Garey and D.S. Johnson, Computers and intractability : a guide to the theory of NP-completeness, Freeman, 1979. [22] C. Bliek, P. Spellucci, L.N. Vicente, A. Neumaier, L. Granvilliers, E. Monfroy, F. Benhamou, E. Huens, P. van Hentenryck, D. Sam-Haroud, and B. Faltings, Algorithms for Solving Nonlinear Constrained and Optimization Problems: The State of The Art, http://www.mat.univie.ac.at/ neum/glopt/coconut/, June 2001. [23] C. Darwin, The origin of species, John Murray, London, 1859. [24] I. Rechenberg, Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution, Frommann-Holzboog, 1973. [25] S. Kirkpatrick, C.D. Gelatt Jr., and M.P. Vecchi, Optimization by simulated annealing, Science, vol. 220, no. 4598, pp. 671680, 1983. [26] R. Dawkins, The selsh gene, Oxford University Press, Oxford, 1976. [27] J.H. Holland, Adaptation in natural and articial systems, University of Michigan Press, Ann Arbor, 1975. [28] E.H.L. Aarts and J. Korst, Simulated annealing and Boltzmann machines: a stochastic approach to combinatorial optimization and neural computing, Wiley-Interscience series in discrete mathematics and optimization. Wiley, Chichester, 1989.

Bibliography

193

[29] L. Ingber, Simulated annealing: practice versus theory, Mathl. Comput. Modelling, vol. 18, no. 11, pp. 2957, 1993. [30] S.W. Stepniewski and A.J. Keane, Pruning back-propagation neural networks using modern stochastic optimization techniques, Neural Computing & Applications, vol. 5, pp. 7698, 1997. [31] S. Chalup and F. Maire, A study on hill climbing algorithms for neural network training, in Proc. 1999 Congres on Evolutionary Computation, 1999, pp. 20142021. [32] K.D. Boese and A.B. Kahng, Best-So-Far vs. Where-You-Are: Implications for Optimal Finite-Time Annealing, Systems and Control Letters, vol. 22, no. 1, pp. 7178, January 1994. [33] J. Cong, T. Kong, F. Liang, J.S. Liu, W.H. Wong, and D. Xu, Dynamic Weighting Monte Carlo for Constrained Floorplan Designs in Mixed Signal Application, in Proc. ASP-DAC 2000, 2000, pp. 277282. [34] T.H. Cormen, C.E. Leiserson, and R.L. Rivest, Introduction to Algorithms, McGraw Hill, 1990. [35] R.H.J.M. Otten and L.P.P.P. van Ginneken, The annealing algorithm, vol. 72 of The Kluwer International Series in Engineering and Computer Science, Kluwer Academic, 1989. [36] L. Ingber, Very Fast Simulated Re-Annealing, Journal of Mathl. Comput. Modelling, vol. 12, pp. 967973, 1989. [37] B. Hajek, Cooling schedules for optimal annealing, Mathematics of operations research, vol. 13, no. 2, pp. 311329, 1988. [38] M. Huang and A. Sangiovanni-Vincentelli, An Efcient General Cooling Schedule for Simulated Annealing, in Proc. International Conference on Computer-Aided Design, 1986, pp. 381384. [39] A.B. Kahng, Classical Floorplanning Harmful?, in Proc. ISPD, 2000, pp. 207213. [40] H. Onodera, Y. Taniguchi, and K. Tamaru, Branch-and-Bound Placement for Building Block Layout, in Proc. ACM/IEEE Design Automation Conference, 1991, pp. 433439. [41] C. Sechen, VLSI placement and global routing using simulated annealing, vol. 54 of The Kluwer international series in engineering and computer science, Kluwer Academic, Dordrecht, 1988. [42] W. Kruiskamp, Analog design automation using genetic algorithms and polytopes, Ph.D. thesis, Eindhoven University of Technology, Eindhoven, The Netherlands, 1996. [43] K. Francken, P. Vancorenland, and G. Gielen, DAISY: a simulation-based highlevel synthesis tool for modulators, in Proc. IEEE International Conference on Computer Aided Design, November 2000, pp. 188192.

194

Bibliography

[44] L. Ingber and B. Rosen, Genetic algorithms and very fast simulated reannealing: A comparison, Mathematical Computer Modeling, vol. 16, no. 11, pp. 87100, 1992. [45] F. Balasa and K. Lampaert, Module Placement for Analog Layout Using the Sequence-Pair Representation, in Proc. ACM/IEEE Design Automation Conference, 1999, pp. 274279. [46] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani, VLSI/PCB Placement with Obstacles Based on Sequence Pair, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 17, pp. 6168, 1998. [47] X. Tang and D.F. Wong, Fast-SP: A Fast Algorithm for Block Placement based on the Sequence Pair, in Proc. ASP-DAC 2001, 2001, pp. 521526. [48] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani, VLSI Module Placement Based on Rectangle-Packing by the Sequence-Pair, IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 15, pp. 15181524, 1996. [49] P. Diaconis and D. Stroock, Geometric bounds for eigenvalues of Markov chains, The Annals of Applied Probability, , no. 1, pp. 3661, 1991. [50] J.S. Liu, Monte Carlo Strategies in Scientic Computing, Springer Series in Statistics. Springer Verlag, March 2001. [51] R.L. Graham, D.E. Knuth, and O. Patashnik, Concrete Mathematics: A Foundation for Computer Science, Addison-Wesley Publishing Company, 1989. [52] J.K. Ousterhout, Corner Stitching: A Data-Structuring Technique for VLSI Layout Tools, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 3, no. 1, pp. 87100, January 1984. [53] W. Pugh, Skip Lists: A Probabilistic Alternative to Balanced Trees, Communications of the ACM, vol. 33, no. 6, pp. 668676, June 1990. [54] D.D. Sleator and R.E. Tarjan, Self-Adjusting Binary Search Trees, Journal of the Association of Computing Machinery, vol. 32, no. 3, pp. 652686, July 1985. [55] R. R nngren and R. Ayani, A Comparative Study of Parallel and Sequential Priority o Queue Algorithms, ACM Transactions on Modeling and Computer Simulation, vol. 7, no. 2, pp. 157209, April 1997. [56] C. Martnez and S. Roura, Randomized binary search trees, J. ACM, vol. 45, no. 2, pp. 288323, March 1998. [57] G. M. Adelson-Velskii and Y. M. Landis, An algorithm for the organization of information, Doklady Akademii Nauk SSSR, vol. 146, pp. 263266, 1962, English translation in Soviet Math. Dokl., 3:1259-1262. [58] P. van Emde Boas, Preserving order in a forest in less than logarithmic time, in Proc. Annual Symposium on Foundations of Computer Science, 1975, pp. 7584.

Bibliography

195

[59] K. Mehlhorn and S. N her, Bounded ordered dictionaries in time and a space, Information Processing Letters, vol. 35, pp. 183189, 1990. [60] R.H.J.M. Otten, What is a Floorplan?, in Proc. ISPD 2000, 2000, pp. 212217. [61] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani, Rectangle-Packing-Based Module Placement, in Proc. ICCAD, 1995, pp. 472479. [62] P.-N. Guo, C.-K. Cheng, and T. Yoshimura, An O-Tree Representation of Non-Slicing Floorplan and Its Applications, in Proc. DAC99, 1999, pp. 268273. [63] S. Nakatake, K. Fujiyoshi, H. Murata, and Y. Kajitani, Module Placement on BSGStructure and IC Layout Applications, in Proc. ICCAD96, 1996, pp. 484491. [64] T. Takahashi, A New Encoding Scheme for Rectangle Packing Problem, in Proc. ASP-DAC 2000, 2000, pp. 175178. [65] S. Sahni and A. Bhatt, The Complexity of Design Automation Problems, in Proc. IEEE/ACM Design Automation Conference, June 1980, pp. 402411. [66] R.H.J.M. Otten, Automatic Floorplan Design, in Proc. DAC82, 1982, pp. 261267. [67] D.F. Wong and C.L. Liu, A New Algorithm for Floorplan Design, in Proc. DAC86, 1986, pp. 101107. [68] D.W. Jepsen and C.D. Gelatt Jr., Macro Placement by Monte Carlo Annealing, in Proc. IEEE International Conference on Computer Design, 1983, pp. 495498. [69] Y.-C. Chang, Y.-W. Chang, G.-M. Wu, and S.-W. Wu, B -Trees: A New Representation for Non-Slicing Floorplans, in Proc. Design Automation Conference, 2000, pp. 458463. [70] Y. Pang, F. Balasa, K. Lampaert, and C.-K. Cheng, Block placement with symmetry constraints based on the O-tree non-slicing representation, in Proc. Design Automation Conference, 2000, pp. 464467. [71] D.E. Knuth, Selected Papers on Computer Science (CSLI Lecture Notes, No. 59), C S L I Publications, June 1996. [72] F. Balasa, Modeling Non-Slicing Floorplans with Binary Trees, in Proc. International Conference on Computer Aided Design, 2000, pp. 1316. [73] X. Hong, G. Huang, Y. Cai, J. Gu, S. Dong, C.-K. Cheng, and J. Gu, Corner Block List: An Effective and Efcient Topological Representation of Non-Slicing Floorplan, in Proc. International Conference on Computer Aided Design, 2000, pp. 812. [74] K. Fujiyoshi and H. Murata, Arbitrary Convex and Concave Rectilinear Block Packing Using Sequence-Pair, in Proc. ISPD99, 1999, pp. 103110. [75] M.Z. Kang and W.W-M. Dai, Arbitrary Rectilinear Block Packing Based on Sequence Pair, in Proc. ICCAD98, 1998, pp. 259266.

196

Bibliography

[76] J. Xu, P.-N. Guo, and C.-K. Cheng, Rectilinear Block Placement Using Sequence Pair, in Proc. ISPD, 1998, pp. 173178. [77] S. Nakatake, M. Furuya, and Y. Kajitani, Module placement on BSG-structure with pre-placed modules and rectilinear modules, in Proc. ASP-DAC98, 1998, pp. 571 576. [78] K. Sakanushi, S. Nakatake, and Y. Kajitani, The Multi-BSG: Stochastic Approach to an Optimum Packing of Convex-Rectilinear Blocks, in Proc. ICCAD98, 1998, pp. 267274. [79] H. Murata, K. Fujiyoshi, T. Watanabe, and Y. Kajitani, A Mapping from SequencePair to Rectangular Dissection, in Proc. ASP-DAC97, 1997, pp. 625633. [80] M.J.M. Pelgrom, A.C.J. Duinmaijer, and A.P.G. Welbers, Matching Properties of MOS Transistors, IEEE Journal of Solid-State Circuits, vol. 24, no. 5, pp. 1433 1440, October 1989. [81] C. Lin and D.M.W. Leenaerts, A New Faster Sequence Pair Algorithm, in Proc. ISCAS 2000, May 2000, vol. III, pp. 407410. [82] J.W. Hunt and T.G. Szymanski, A Fast Algorithm for Computing Longest Common Subsequences, Communications of the ACM, vol. 20, no. 5, pp. 350353, March 1977. [83] X. Tang, R. Tian, and D.F. Wong, Fast Evaluation of Sequence Pair in Block Placement by Longest Common Subsequence Computation, in Proc. DATE 2000, 2000, pp. 106111. [84] D.E. Knuth, The Art of Computer Programming, vol. 3, Addison-Wesley Publishing Company, edition, 1989. [85] T. Takahashi, An Algorithm for Finding a Maximum-Weight Decreasing Sequence in a Permutation, Motivated by Rectangle Packing Problem, Tech. Rep. IEICE, vol. VLD96, no. 201, pp. 3135, 1996. [86] E. M kinen, On the longest upsequence problem for permutations, Tech. Rep. Aa 1999-7, University of Tampere, Finland, 1999. [87] M.-S. Chang and F.-H. Wang, Efcient algorithms for the maximum weight clique and maximum weight independent set problems on permutation graphs, Information Processing Letters, vol. 43, pp. 293295, 1992. [88] W.L. Hsu, Maximum weight clique algorithms for circular-arc graphs and circle graphs, SIAM J. Comput., vol. 14, pp. 224231, 1985. [89] P. Beame and F. E. Fich, Optimal Bounds for the Predecessor Problem, in Proc. STOC99, 1999, pp. 295304. [90] G. Ramalingam and T. Reps, On the computational complexity of dynamic graph algorithms, Theoretical Computer Science, vol. 158, pp. 233277, 1996.

Bibliography

197

[91] G. Ramalingam and T. Reps, An Incremental Algorithm for a Generalization of the Shortest-Path Problem, Journal of Algorithms, vol. 21, pp. 267305, 1996. [92] D. Frigioni, M. Ioffreda, U. Nanni, and G. Pasqualone, Experimental Analysis of Dynamic Algorithms for the Single Source Shortest Path Problem, in Proc. Workshop on Algorithm Engineering, 1997, pp. 5463. [93] K. Kozminski, MCNC benchmark data, in International Workshop on Layout Synthesis 1990, 1990, http://www.cbl.ncsu.edu/CBL Docs/lys90.html. [94] S. Nakatake, Y. Kubo, and Y. Kajitani, Consistent Floorplanning with Super Hierarchical Constraints, in Proc. ISPD01, 2001, pp. 144149. [95] X. Tang, Constrained Sequence-Pair-Based Placement, private communication, 2001. [96] J.L. Ganley, Geometric Interconnection and Placement Algorithms, Ph.D. thesis, University of Virginia, 1995. [97] M.R. Garey and D.S. Johnson, The rectilinear Steiner tree problem is NP-complete, SIAM J. Appl. Math., vol. 32, pp. 826834, 1977. [98] M. Hanan, On Steiners problem with rectilinear distance, J. SIAM Appl. Math., vol. 14, pp. 255265, 1966. [99] E.W. Dijkstra, A note on two problems in connexion with graphs, Numerische Mathematik, vol. 1, pp. 269271, 1959. [100] N.J. Nilsson, Principles of Articial Intelligence, Tioga Publishing Company, Palo Alto, CA, 1980. [101] J. Pearl, Heuristics: Intelligent Search Strategies for Computer Problem Solving, The Addison-Wesley Series in Articial Intelligence. Addison-Wesley, Reading, Mass., 1984. [102] R.C. Prim, Shortest connection networks and some generalizations, Bell System Technical Journal, vol. 36, pp. 13891401, 1957. [103] H.-P. Tseng, Detailed Routing Algorithms for VLSI Circuits, Ph.D. thesis, University of Washington, Seattle, 1997. [104] C. Chiang, M. Sarrafzadeh, and C.K. Wong, Global Routing Based on Steiner MinMax Trees, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 9, no. 12, pp. 13181325, 1990. [105] C.Y. Lee, An algorithm for path connections and its applications, IRE Transactions Electronic Computers, vol. EC-10, no. 3, pp. 346365, 1961. [106] K. Kanchanasut, A shortest-path algorithm for Manhattan graphs, Information Processing Letters, vol. 49, pp. 2125, 1994.

198

Bibliography

[107] T. Matsumoto, N. Saigan, and K. Tsuji, Two new efcient approximation algorithms for the Steiner tree problem in rectilinear graphs, in Proc. Int. Symp. on Circuits and Systems, June 1991, vol. 2 of 5, pp. 11561159. [108] E. Malavasi and A. Sangiovanni-Vincentelli, Area routing for analog layout, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 12, no. 8, pp. 11861197, August 1993. [109] T. Adler and E. Barke, Single step current driven routing of multiterminal signal nets for analog applications, in Proc. Design, Automation and Test in Europe Conference and Exhibition 2000, 2000, pp. 446450. [110] L.-C.E. Liu and C. Sechen, Multilayer Chip-Level Global Routing Using an Efcient Graph-Based Steiner Tree Heuristic, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 18, no. 10, pp. 14421451, October 1999. [111] U. Choudhury and A. Sangiovanni-Vincentelli, Use of Performance Sensitivities in Routing of Analog Circuits, in Proc. International Symposium on Circuits and Systems, 1990, vol. 1, pp. 348351. [112] J. Cong and P.H. Madden, Performance Driven Multi-Layer General Area Routing for PCB/MCM Designs, in Proc. Design Automation Conference, 1998, pp. 356361. [113] J.P. Cohoon and D.S. Richards, Optimal two-terminal wire routing, Integration: the VLSI Journal, vol. 6, pp. 3557, 1988. [114] S.Q. Zheng, J.S. Lim, and S.S. Iyengar, Finding Obstacle-Avoiding Shortest Paths Using Implicit Connection Graphs, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 15, no. 1, pp. 103110, January 1996. [115] D. Warme, GeoSteiner extensions for Steiner minimal trees in graphs, private communication, 2000. [116] R. Dechter and J. Pearl, Generalized Best-First Search Strategies and the Optimality of A , Journal of the Association of Computing Machinery, vol. 32, no. 3, pp. 505 536, July 1985. [117] H. Zhou, N. Shenoy, and W. Nicholls, Efcient Minimum Spanning Tree Construction without Delaunay Triangulation, in Proc. ASP-DAC 2001, 2001. [118] F.K. Hwang, On Steiner minimal trees with rectilinear distance, SIAM Journal of Applied Mathematics, vol. 30, no. 1, pp. 104114, 1976. [119] I.I. M ndoiu, V.V. Vazirani, and J.L. Ganley, A new heuristic for rectilinear Steiner a trees, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 19, no. 10, pp. 11291139, October 2000. [120] H. Takahashi and A. Matsuyama, An approximate solution for the Steiner problem in graphs, Math. Japonica, vol. 24, no. 6, pp. 573577, 1980.

Bibliography

199

[121] V.J. Rayward-Smith and A. Clare, On nding Steiner vertices, Networks, vol. 16, pp. 283294, 1986. [122] P. Winter and J. MacGregor Smith, Path-distance heuristics for the Steiner problem in undirected networks, Algorithmica, vol. 7, pp. 309327, 1992. [123] E.P. Huijbregts, A Complete Design Path for the Layout of Flexible Macros, Ph.D. thesis, Eindhoven University of Technology, Eindhoven, The Netherlands, 1996. [124] V.J. Rayward-Smith, The computation of nearly minimal Steiner trees in graphs, Int. J. Math. Educ. Sci. Technol., vol. 14, pp. 1523, 1983. [125] D.M. Warme, P. Winter, and http://www.diku.dk/geosteiner, 1999. M. Zachariasen, GeoSteiner 3.0,

[126] T. Sakurai, Closed-Form Expressions for Interconnection Delay, Coupling, and Crosstalk in VLSIs, IEEE Transactions on Electron Devices, vol. 40, no. 1, pp. 118124, January 1993. [127] K. Doris, C. Lin, and A.H.M. van Roermund, D/A Conversion: Amplitude and Time Error Mapping Optimization, in Proc. ICECS 2001, September 2001, pp. 863866. [128] R. Gharpurey, Modeling and Analysis of Substrate Coupling in Integrated Circuits, Ph.D. thesis, University of California, Berkeley, 1995. [129] H. Veendrick, Deep-Submicron CMOS ICs: From basics to ASICs, Kluwer Academic Publishers, 2 edition, 2000. [130] K. Joardar, A simple approach to modeling cross-talk in integrated circuits, IEEE Journal of Solid-State Circuits, vol. 29, pp. 12121219, 1994. [131] L. Deferm, C. Claes, and G.J. Declerck, Two- and Three-Dimensional Calculation of Substrate Resistance, IEEE Transactions on Electron Devices, vol. 35, no. 3, pp. 339352, March 1988. [132] P.N. Parakh and R.B. Brown, Crosstalk Constrained Global Route Embedding, in Proc. ISPD99, 1999, pp. 201206. [133] H. Zhou and D.F. Wong, Optimal River Routing with Crosstalk Constraints, ACM Transactions on Design Automation of Electronic Systems, vol. 3, no. 3, pp. 496514, July 1998. [134] G.E. Forsythe, M.A. Malcolm, and C.B. Moler, Computer methods for mathematical computations, Prentice-Hall, 1977. [135] W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes in C: The Art of Scientic Computing, Cambridge University Press, 2nd edition, 1992.

200

Bibliography

Acknowledgement
A thesis is not complete without having given credit to those who have contributed to the contents and shape of this work, directly or indirectly. First of all I would like to thank my coach Dr. Domine Leenaerts for the fruitful discussions we had from which many unexpected and stimulating thoughts have bloomed. Also, his criticism has surely led to a higher quality of this research work. Moreover, I am very grateful to both Domine and my former advisor Prof. Wim van Bokhoven for giving me full research freedom; a condition that has made me enjoy this work even more. I would also like to express my gratitude to my current advisor Prof. Arthur van Roermund with whom I had many constructive in-depth discussions. Not only did he open my mind for different views towards my work, but he also motivated me to clarify several aspects which eventually helped me to improve my own understanding of the matter. My second advisor, Prof. Ralph Otten, has also been of great help in improving the quality of this thesis. My time in the Mixed-signal Microelectronics (MsM) research group would not have been so enjoyable without the presence of: Mrs. Linda Balvers (thanks for your support and kindness), Dr. Joost Briaire (thanks for the discussions), Dr. Hans Hegt (thank you for your openness), Mr. Piet Klessens (thanks for keeping the systems up-and-running and the oliebollen), and all other MsM members. There is one person who needs special mentioning because he was stuck with me in the same room for three years. Kostas Doris, thank you for being a nice companion and friend. Moreover, I appreciate your involvement, both in scientic as well as in social respects. I have had the pleasure to tutor two students who both did a great job on part of this research work. Mario Schehle did a substantial amount of work on routing algorithms, and Lennart Reus built a nice graphical user interface. Thanks guys! I would not be a worthy XBlast player if I would ignore those numerous fun periods that lasted longer than they should; time really ies when you are having fun. In the rst place I must thank the main author of the great XBlast game, Oliver Vogel, for making the lives of many Ph.D. students a whole lot more pleasant. I thank my fellow XBlasters, not only for giving me some points now and then, but also for quite some interesting discussions. In order of disappearance from the scene: Dr. Jurgen nu voor het echie van Engelen, Dr. Dani l e ik pak je Schobben, Dr. Arno wat doe je daar van Leest. Dr. Eddine ikel Sarroukh. I believe, as yet, it is still unclear who the real Master Blaster is. As with research, the game never ends. In addition to the people in my vicinity, there are a few persons I am indebted to for their contribution to some important parts of this research work. These people are: Dr. David Warme, Dr. Aart Blokhuis, and Dr. Thorsten Koch. Unfortunately, I cannot thank everyone explicitly, although I would like to do so. Therefore, to all persons who feel they should be mentioned in this acknowledgement, but are not, you know who you are.

202

Acknowledgement

I eagerly take this opportunity to express my gratitude towards my wife and my sons who have endured many many hours of my absent-mindedness and absence due to research work. I admit that a better optimization approach is needed here. Last but not least, I thank my parents for giving me the opportunity to choose, and their support for my choices. Chieh Lin December 29, 2001

Curriculum Vitae
Chieh Achie Lin was born on December 2, 1972 in Ruian City, China. He received his diploma from the Bouwens van der Boijecollege in Panningen, the Netherlands, in 1991. In September 1991 he started a study Informatietechniek at the department of Electrical Engineering, Eindhoven University of Technology, The Netherlands. In June 1997 he received the Ingenieur (Ir.) degree from this institute. His nal report was titled Design and Implementation of SHANNI: a Stand-alone Hybrid Articial Neural Network Implementation. Thereafter, he worked towards a Ph.D. degree in the Mixed-signal Microelectronics research group of the department of Electrical Engineering at Eindhoven University of Technology. Based on the work presented in this thesis, he expects to receive the Ph.D. degree on Wednesday, Februari 20, 2002. Since June 2001, he is with Philips Research, Electronic Design & Tools. Currently he is focusing on development of CAD tools (for analog simulation and synthesis), with emphasis on radio-frequency issues. This research was supported by the Dutch Organization for Scientic Research (NWO).

You might also like