You are on page 1of 428

Source : ASIC Design in the Silicon Sandbox Keith Barr 1

The Sandbox
Analog and digital circuits combined in an IC are considered mixed-signal designs.
Integrating the two types of circuitry can be challenging, but these designs can provide
system-on-a-chip (SOC) functionality that, once designed into a product, can significantly
impact final product cost. As a designer of commercially viable products, already buying
ICs from major suppliers, you could approach a major IC company and suggest that they
design a new catalog part for your application; but without some costly agreement, they
would likely offer the part to your competitors as well, somewhat dulling the advantage
you may be seeking. You can contract an IC design house to produce a design for you,
but in the process you will be transferring specific knowledge of your business to others
that you may not be able to completely control. Communication of exactly what you need
is difficult without knowledge of the IC design process; it’s like a sales guy talking to
an engineer, enough said? Further, the cost of having a design house do the work can
easily approach a million dollars, even for a fairly simple design. If you do your own
design, you can keep the details as the intellectual property of your company and get
exactly what you want, at lower cost, with well-known reasons for any trade-offs.

IC Overview
Integrated circuits are fabricated onto silicon wafers, subsequently diced or sawn into
individual die and lead bonded onto a leadframe, and then packaged with a surrounding
mineral-filled thermosetting packaging material, or in the case of a ceramic package, a lid
is attached. Depending on die size and wafer diameter, as few as 10 or as many as 50,000
devices could result from a single wafer. Every IC you currently purchase and

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 2

use in a product is produced in this way. If you take any standard, plastic-packaged IC,
lay it upside down on a piece of 220 grit sandpaper and carefully grind away the top
surface, you will ultimately begin to see the gold bonding wires appear and then the
silicon die itself. Shifting to finer sandpaper, and carefully adjusting the pressure you
apply while sanding, you will be able to prepare the part for microscopic investigation.
For this, you will require an epi-illuminated microscope of high magnification power.
Also, you will be able to measure the die size, which will give you an idea of how much
the part cost the manufacturer to produce.

A Peek Under the Hood


An epi-illuminated microscope is often called a metallurgical microscope—one where
light is sent to the specimen through the same optics that the image returns to the
eyepieces. This is of course required in those cases where specimens are opaque, and the
low demand for such microscopes makes them hard to find and expensive. My first one
(for revealing the details of a DRAM design), purchased for about $1200, was a battle-
worn 1960s model, retired from an unknown IC inspection line. A really good epi
microscope will be mostly cast iron, weigh 50 to 100 lb, have a precision X-Y table
attached with a digital readout to 0.5 μ resolution, and cost $25,000 new, and maybe
$8000 used, in good condition (see Figures 1.1 and 1.2). You don’t have to have one to
design projects, but it can be valuable if something goes wrong with your design and you
need to probe the design to get on-chip signals.
I strongly suggest you find an epi microscope, because it can open up a new world to
you, providing insight into how other designers have solved problems. When shopping,
look for microscopes with objective lenses that have a considerable working distance but
a high numerical aperture (NA). These two characteristics are at odds from an optical
design standpoint, and basically mean that such objectives are expensive. You need the
high NA to get good resolution of small details, but you need a working distance of
maybe 8 to 15 mm to allow enough space between the objective and your IC to allow for
probe needles. An X-Y measurement table is really handy, allowing you to measure
details like die size and device dimensions with good precision.
Such microscopes often allow for both bright-field and dark-field illumination. Dark
field means the illuminating light is traveling through the same basic path as the observed
light, but at the objective it is focused to the object by a doughnut-shaped lens that
surrounds the viewing objective lens, causing the light to hit the specimen at an angle.
This may be useful in IC inspection, when looking for defects or seeing the crystal
structure of metals, but greatly increases the cost of your

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 3

Figure 1.1 An epi microscope available at low cost.

objective lenses. You won’t need the dark-field feature, and your objectives will cost much
less if you just go for the bright field only types. Typical magnifications required would be
from 100X to 1000X, which means 10X eyepieces and a few objectives, maybe 10X, 20X,
50X, and 100X. In the last case, I use an oil immersion lens intended for biological specimens,
which has no working distance at all (a drop of oil spans the gap between objective and
specimen), but this is the only way to get very high resolution (NA is greater than 1.0).
Although the sandpaper technique is acceptable for preparing an IC for die size
measurement, it often destroys the IC surface, in which case more drastic measures must be
taken. The standard technique for decapsulating a plastic packaged IC is to boil it in a mixture
of concentrated sulfuric and nitric acids, although I prefer near-boiling sulfuric acid alone
(97%). Most IC packages, leadframes, and all, will be completely digested by this method, but
the silicon nitride coating on the die, as well as the aluminum pads, will be preserved in
pristine condition. Only a few milliliters of H2SO4 in a small beaker on a hot plate does the
trick,

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 4

Figure 1.2 A quality epi microscope with X-Y measurement capability.

but you may need to decant off the acid (and the dissolved junk), replacing with fresh acid a
few times until the job is done. Don’t overheat the beaker, as the acid will fume into your
work area and be nasty; this is best done in a fume hood or outdoors. Of course, this is
dangerous, so be extremely careful, wash your hands thoroughly and frequently, neutralize the
acid with baking soda before tossing it away, and when it comes to local regulations about
these things, well, you’re on your own!
If you’re one of those engineers that see electronics as both a profession and a hobby,
you’ll really get a kick out of tearing ICs apart.
Thank you for tolerating my indulgence in the fun of IC design, now back to the serious
stuff.

The Basic Process


Although there are many different processes by which semiconductor devices are made, the
processes most commonly available to designers that do not own a wafer fabrication facility
(fab or foundry) are invariably CMOS. Each process that you design with has an associated set
of

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 5

design rules that must be strictly obeyed. These rules must be acquired from the fab you have
chosen for your project. Often this means signing a nondisclosure agreement (NDA), as fabs
are careful about the casual distribution of their process details to competitors. Further, as
outlined in the next chapter, not all fabs will be available to you; it depends on your
company’s size and your project’s meaning to the fab in terms of overall business. This
only means you will need to find a broker for processes that you cannot access directly.
Designing a part to a fab’s rules and having the fab produce wafers for you puts you in the
position of being a fabless semiconductor company. As such, you will have several fab houses,
and numerous processes from which to choose. Each fab has developed each of its processes
to work reliably when the rules are obeyed, and can supply detailed information about the
character of the resulting structures you may wish to use in a design. When you design with
discrete components, you don’t need to know how the parts actually work, you only need to
know how they behave. This is also true when you design an IC, as the details of the process
are carefully worked out at the fab by process engineers prior to the fab making the process
available. As a designer, you must accept the process as it was developed; you can’t change
the process, so it is not necessary to know the details of the process. You don’t have to be a
semiconductor physicist to design an IC, any more than you needed to be a transistor designer
to use one. You will, however, be able to use the fab’s documentation and device models to
determine, to a high degree of accuracy, how the resulting structures behave. This is much like
designing with discrete parts, except you can order up just about any part you want, complete
with data sheets that you generate from the fab’s process data.
Let’s begin our understanding of the sandbox by looking at the process of manufacture a
bit, and filling in some relevant details, as required.
First of all, the CMOS processes you will encounter will all be fabricated in silicon, and of
either N- or P-type starting material, called a wafer or the substrate. Although N type wafers
have been used in the past, the use of P-type substrates predominates today, possibly because a
P substrate will be at ground potential in a system that operates from a positive supply. In the
past, the use of an N substrate for general purpose ICs required that the substrate be connected
to the positive supply. I suppose you can do this, but would you want to? I will, generally,
refer only to P substrate processes here. Wafers are available in different diameters, and each
process uses a wafer diameter that fits the equipment the fab builds that process on. Wafers
can range from 4 to 12 in. in diameter, but most CMOS processes of interest to ASIC
designers will be either 6 or 8 in. Wafer thickness is usually about 0.75 mm, which is required
to give the wafer strength during handling,

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 6

but wafers are often back lapped or back ground to thinner dimensions just prior to packaging
into low-profile packages. At the point of packaging, the wafer may have been reduced in
thickness to approximately 0.25 mm (250 μ).
All fab processes are done to only one side of the wafer. Diffusions (using an implant
process) are dopant impurity atoms driven into the silicon at high velocity and then diffused
into the silicon at high temperatures. These diffusions constitute connections to the substrate
or diode junctions within the substrate, depending on doping polarity. Insulation is provided
by simply oxidizing the wafer in an oven at high temperature, turning silicon (semiconductor)
to silicon dioxide (excellent insulator), or through the deposition of silicon oxides or nitrides,
as required. These insulations are, for all practical purposes, perfect; unlike insulations
encountered in PCB design, which can suffer from adsorbed moisture, a wire encased in the
insulations normally found in an IC process will have zero leakage to adjacent wires.
All of the active devices available (diodes, bipolar transistors, and MOSFETs (metal oxide
semiconductor field effect transistors) utilize only a few different types of diffused junctions.
Metal connections to the junctions and the gates of MOSFETS are provided by additional
layers, deposited and patterned onto the surface of the wafer and insulated by added insulation
layers. Polysilicon is a somewhat resistive conductor, but can withstand high processing
temperatures, and is found universally as the gate material for MOSFETS, while other
interconnecting conductor layers are chiefly composed of aluminum, often sandwiched
between more refractory metals. The entire set of layers with which you must be concerned is
quite limited and once the basic process is understood, immediately obvious.
An N-type diffusion in a P-type substrate constitutes a substrate diode; the substrate is the
diode’s anode, the N diffusion is the diode’s cathode (shown in Figure 1.3). A connection
to an N diffusion in the substrate will conduct current to the substrate if brought to a negative
potential (relative to the substrate), but the diode will allow positive potentials, as the diode
will be reverse biased. Such diodes are infrequently used as diodes, for reasons that we will
discover later, but they are inherent in all N-type MOS (NMOS) devices. Therefore, both the
source and drain terminals of an N device (which are N diffusions) have parasitic diodes to
substrate that cannot be avoided. The diffusion that causes this diode (or NMOS terminal) is a
fairly conductive diffusion, and very shallow, in the order of a few tenths of a micron (1
micron = 1 μ =1 μm = 0.001 mm = 1E–6 m),and is defined by the combination of an active
area mask and the N implant mask.
A second N-type diffusion is employed to create N well areas in the P substrate, for the
purpose of establishing an opposite polarity substrate

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 7

Figure 1.3 Illustration of N diffusion in P substrate, cross-sectional view.

within which PMOS devices can be fabricated. This is simply called the NWELL, and is of
much lower conductivity than the N implant, and is diffused quite deeply into the substrate
(several microns). The N wells are normally (but not always) connected to the IC’s positive
supply. N wells are also diodes within the substrate, just like the N diffusions.
Within the N well, PMOS devices are fabricated with P-type diffusions, in the same way
NMOS devices are built onto the P substrate with N diffusions. The P implant in an N well
gives rise to a well diode, much like the substrate diode, but in this case, it is also a bipolar
transistor; the P diffusion acts like the emitter, the well like the base, and the substrate like the
collector of a PNP transistor. This is called a dedicated collector bipolar device, since the
collector is permanently connected to the substrate. PMOS devices can only be constructed in
an N well, and have unavoidable well diodes at their source and drain terminals. These diodes
conduct to the well (supply) only when the P diffusion within the well is brought to a potential
that is more positive than the supply that is connected to the well, but allow lower potential to
be applied, even potentials that are negative with respect to the substrate. (Note: The term
“well diode” could also be used to describe the PN junction between the well diffusion and
the substrate.)
Electrical connections are made to the substrate through P diffusion regions, N wells are
connected to supply through N diffusion regions, and the use of diffusions in opposite polarity
material (N in substrate or P in well) constitute diodes or, most commonly, MOSFET
connections. All connections to the silicon itself are only done through either N- or P-diffused
areas. This is really quite simple.
Field oxide (FOX) is grown in an oxygen atmosphere selectively into the silicon to insulate
gate poly from the silicon surface. All areas that are not FOX are called active area (AA), and
are implanted with either N or P dopant or covered with polysilicon gate. In Figure 1.4, P+
and N+ represent the heavily doped regions, to distinguish from the weaker well diffusion
marked N, or the substrate material marked P.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 8

Figure 1.4 Cross-sectioned view of both P and N diffusions, as substrate and well
connections and also MOSFET source and drain connections.

Contact can only be made to substrate or well through P or N diffusion, respectively. Attempts
to contact a metal layer to the substrate or well without a proper diffusion will violate foundry
rules. The IC design tools you use can quickly show any instances of such rule violations.
The transistor gates of Figure 1.4 are shown end-on, insulated from the silicon by a very
thin oxide layer called the thin oxide layer (TOX). This oxide layer is also thermally grown,
and controls the effect the gate potential has on establishing conductivity between the source
and drain terminals, which are, by the way, indistinguishable. The MOSFETs in CMOS
processes are symmetrical, source and drain being interchangeable. Figure 1.5 shows a top
view of these features, and Figure 1.6 shows a cross-sectioned view orthogonal to that of
Figure 1.4.
The gate oxide layer, thermally grown onto the active area of the silicon surface, is
extremely thin, measured in angstrom units (1 Å = 0.1 nM). The gate oxide of a typical 0.35 μ
CMOS process is 70 Å thick.

Figure 1.5 Top view of contact diffusions, transistors, and well feature.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 9

Figure 1.6 Cross-sectioned view of poly gate as it transitions onto field oxide from a
transistor active area.

Implanted active area adjacent to a poly gate, acting as contacts to a MOS device, is
fundamental to any modern CMOS process. The great improvement that allowed dense
CMOS circuits to operate at extremely fast switching speeds was the invention of the self-
aligned gate, where the gate material is deposited and patterned over active area, and only
then is N and P implantation done, to ensure that the source and drain diffusions are precisely
aligned with the gate material. The FOX and the polysilicon gate act as masks that block
implantation. This technique minimizes the overlap of gate and drain regions that could
constitute a Miller capacitance, and allows significant tolerances in the positioning of the gate
material. When drawing transistors, the active area is a continuous block that crosses the
drawn gate area (shown by a dotted line in the top view).The implantation of active area that is
crossed by a polysilicon gate creates a transistor.
Details remain, but this really is simple, probably simpler than you had imagined.
The gate poly layer has a resistivity that can be controlled by the application of implants to
provide a wide range of possible values (resistors), and a second poly layer can be applied
(depending on the process), which allows for poly-poly capacitors—two stacked conducting
layers separated by a very thin oxide layer. The diffusions, or even the well, can be used as a
resistive material, and stacks of metal layers with insulation between can also be used as
capacitors. A wide range of useful structures can be built using these few patterned layers.
The critical first steps of wafer fabrication, the “front end” of the process is over as soon
as the implants are done and a protective layer of silicon dioxide is applied. In certain process
variations (later) other layers may be added, but basically the remaining operations (back end)
are the etching of insulation to allow contact with lower layers and the deposition and
patterning of metal, in successive layers, to interconnect the N and P transistors. Sure, there
are details, but thankfully most of the really complicated stuff is done by the fab, without the
designer being concerned.
As shown in Figure 1.7, a layer of insulation is deposited, and contact “holes” are etched
for connection by the first layer of metal, called M1

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 10

Figure 1.7 Cross-sectioned view of basic CMOS structures.

(contacts to gate are not shown). Only two additional layers need be drawn to achieve this
level of interconnection–CNT (contact holes) and M1. Not shown are subsequent wiring
layers that can be added; VIA will define holes in the second insulation layer to connect M1 to
M2, VIA2 will connect M2 to M3, and so on. The top level is always a passivation layer of
silicon nitride, a particularly hard, chemically inert material that has areas etched away to
expose bonding pads so that the chip may be electrically connected into a finished package.

Masks
All of the features on the surface of an IC are defined by photomasks. Each layer requires a
unique mask that is produced from the designer’s drawing. A mask is a very precise block of
transparent, optically flat material (fused silica) upon which is deposited a thin layer of metal
(chromium) to selectively block the passage of light. Typically, masks are scaled to five times
the desired dimension on silicon, so that in production, a photo imager can project the mask
pattern onto the wafer through a 5:1 reducing lens. These lenses are extremely expensive, and
photo imagers can constitute a significant fraction of a fab’s initial investment.
The area imaged onto the wafer is usually a square of about 20 mm on a side, requiring the
mask to have a patterned area of about 100 mm by 100 mm (4 in. sq). The entire wafer is
exposed by a given mask’s pattern through a step-and-repeat process, until the entire wafer
is covered. As a result, the imager is often called a stepper. If a design is very large, it could
cover the entire mask field, but smaller designs would have many copies of the design
precisely arranged to fill the maximum imaging area.
The masks are very expensive, particularly for the finer line processes; each mask is
produced with an electron beam mask writer that affects a thin layer of photoresist, which,
when developed, allows the selective etching of the thin metal layer to produce the finished
mask. The masks

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 11

are then carefully inspected, and if a flaw is found, a repair is made or a new mask is produced
until one is found to be defect free. Depending on the density of the design, a mask can take
many hours to produce on a very expensive mask writing machine.
It must be appreciated that the area imaged onto a wafer is always about 20 mm on a side, in
all processes. For a 0.6 μ process, and a 0.1 μ manufacturing “grid,” the total number of
possible grid points across the mask would comprise a 200,000 by 200,000 array. At the finer
process levels, say a 90 nm process with a 0.01 μ grid, this changes to a 2,000,000 by
2,000,000 array, which is 100 times more detailed. Fine line masks are expensive, as the work
required to produce them increases as the square of the linear detail density.
The wafers to be imaged are first coated with a photoresist polymer, which is affected
selectively by the imaging process; developing the exposed wafer with various solutions
leaves the wafer with the remaining resist in the desired pattern. During subsequent
operations, the patterned resist can be used to block implants or allow previously deposited
layers to be selectively etched away. The resist is then removed.

CMOS Layers
A typical, simple CMOS process would involve the use of 12 to 16 masks, depending on the
number of metal layers and added features like double poly or resistor implants. Depending on
the technology, a mask set can cost from about $15,000 for older technologies, to well over a
million dollars, for cutting edge processes.
To get an idea of how many layers are needed, and, therefore, how many need be drawn,
consider this simplified list, and try to imagine the structures that each mask forms:

■ NWELL well definition


■ AA active area (simultaneously defines FOX)
■ POLY polysilicon gate
■ NIMP N+ implant
■ PIMP P+ implant
■ CNT holes in first insulation
■ M1 first metal layer
■ VIA holes in second insulation
■ M2 second metal layer
■ PAD holes in passivation for bonding pads

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 12

This is only 10 layers, but others are often required to define features that enhance the
process by providing extra layers of poly or metal, while other extra layers are often required
to properly complete the basic process. These extra masks are often derived from the above
drawn layers by the foundry, or required as copies of a given drawn layer from the designer.
For a simple, 2-metal logic process, the above layer list is the minimum that should be
required to be drawn by the designer. Considering the straightforward nature of the structures,
as they have been described so far, this is a fairly simple concept to grasp.

Process Enhancements
The resistivity of bulk silicon used in modern CMOS processes is on the order of 20 Ω-cm.
The resistivity of implanted regions is on the order of 10 mΩ-cm, which seems quite
conductive, but when one realizes the extraordinary thinness of the implanted regions, a sheet
resistivity of several hundred ohms per square is realized. Further, the polysilicon used as gate
material can have a typical sheet resistivity of 30 to 40 Ω/sq, making long runs of polysilicon a
bad choice for quickly changing signals. A 1-mm run of 0.5-μ wide polysilicon could measure
70,000 Ω end to end. The silicide process allows a refractory metal to be diffused into the
silicon structures to significantly reduce the resistance of both polysilicon and diffused regions
down to several ohms per square, but is often masked-off in certain structures where the native
resistivity of the material is desired. This silicide block layer is available in many processes.
The resistivity of polysilicon is usually made as low as possible by doping the layer with
impurities to increase its conductivity. If left undoped, however, polysilicon can have very
high resistivity, on the order of several megohms per square. An extra undoped poly mask may
be required to block the doping, allowing high-valued resistors.
The gate oxide that rests below the poly gate, under which a channel is formed by the
potential of the gate to allow a transistor to conduct, is as thin as possible so that the gate may
have the greatest influence on the underlying silicon. All processes are designed to run at a
particular maximum voltage, partly due to the breakdown potential of TOX. In very low
voltage processes, a second TOX thickness can be selected through a thick oxide mask,
allowing devices to be drawn that can tolerate higher potentials, usually as I/O devices. The
use of a thicker oxide, however, yields devices with unsatisfactory threshold characteristics, so
it is common to have, along with the thick oxide mask, a threshold adjust implant mask as
well. This allows the designer several choices when applying these masks in combination to
devices.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 13

Figure 1.8 Cross-sectioned view of a vertical NPN transistor with additional P-type base
implant.

If your design requires capacitors, stacks of insulation sandwiched by metal can provide
small capacitors, but larger caps will require a second poly mask. The oxidation of the surface
of a poly layer with a second layer of poly deposited and patterned above can provide much
higher capacitance values than the metal stack version, but requires at least one additional
mask. Some processes allow metal-insulator-metal (MIM) capacitors, where a mask layer can
be used to make the insulation between two metal layers extremely thin. Capacitors of this
type are useful in RF designs where the resistance of the poly layers in poly-poly capacitors
could degrade resonator Q values.
All P-substrate (NWELL) CMOS processes have the dedicated collector PNP bipolar
device as a natural feature, but the NWELL can be used as the floating collector of a vertical
NPN by the addition of a P-type base layer, as shown in Figure 1.8.
These devices are usually well characterized by the foundry as specific-sized devices. The
floating collector NPN can be used in analog multiplier circuits and low noise amplifiers.
Bipolar devices are more desirable than MOSFETS in certain applications.
Finally, some processes involve extra masks to allow for very high-voltage devices. Usually
these devices are very carefully constructed, and you may need to coordinate very closely with
the fab to get good, reliable results.

A Completely Different Scale


When transitioning from the board level to a custom ASIC, the first shock to overcome is that
of scale. Your complete chip may be as small as 2 mm on a side, and contain 100,000
transistors. Circuits on this ultra-small level are different from their PCB counterparts, simply
on account of size. The capacitance at the gate of a small MOSFET could be as low as 1 fF
(femto = 1E − 15); the capacitance at the pin of a device

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 14

on a PCB is often 5000 times as great. Leakages are similarly low, the diode junction at
source/drain terminals often show leakages on the order of hundreds of atto-amps (atto = 1E −
18). The capacitance of a connection wire, conducting a signal from one place to another on-
chip, is roughly 100 aF/μm or about 0.1 fF/μm; you run a line halfway across your design and
its capacitive loading is still tiny. Transistors can be sized to conduct anywhere from femto-
amps to amperes, depending on their sizing and bias conditions. Resistors can range from
near-zero to tens of megohms (with a hi-res mask), but capacitors are often frustratingly small;
poly-poly caps larger than tens of picofarads being real space-wasters. Typical poly-poly cap
or MIM cap values are on the order of 1 fF/μ2.
This entirely different scale requires a new understanding, because in many cases,
especially with analog filters, either you bring out pins on the IC to connect to external caps,
or find new ways to use the tiny ones available on-chip.
Not all is perfect with resistors either. Typically, resistance values are difficult to control
with precision, a worst case variation of ±40% being common, and they have poor
temperature coefficients too. The positive side, however, is that the resistors match very well,
on the order of 0.1%, or better. In many cases you don’t really care about the exact resistance
value, but matching to other resistors, such as in a voltage divider, is critical. Capacitors also
match well, but their value could vary by ±10% from run to run.
Because of the small size, on-chip conductor inductance only becomes important in very
high-frequency RF designs, and the only serious inductance to be considered is that of the
bonding wires to the package and the package leadframe conductors. The on-chip issues are
almost entirely that of transistor drive current, the resistance of metal connections, and load
capacitance; it’s only an RC consideration, which very much simplifies dynamic
calculations.
The metal layers used are on the order of 0.5-μm thick, and can be placed with 0.5-μ
spacing (in an average 0.35-μ process), so the capacitance between parallel conductors can be
as influential as that between a given conductor and substrate; this can cause problems, but
once understood, steps can be taken to minimize coupling between conductors or to avoid the
situation altogether. The thinness of the conductor layers gives rise to a metal sheet resistivity
of perhaps 0.1 Ω/sq, so the 0.5-μm wide metal run that goes halfway across your 2 mm2 chip
has a resistance of 200 Ω. This may be OK, since the line is probably only driving a total load
of a few hundred femto-farads, and the total time constant for signal propagation is on the
order of 40 ps. The resistance and capacitance associated with signal propagation generally
dominate; signal lines on chip do not require analysis as transmission lines, for they

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 15

are generally quite lossy, spectacularly short, and do not require termination. This is a
significant difference from PCB level designs where signal line termination can be critical.
Most folks that dive into this tiny world for the first time require a few moments to adjust to
the scale of it all, often punctuated with comments like “whoa,” and “ahh.” Once
comfortable with it, your understanding of electronics will have broadened, and new ideas will
hopefully come to mind.

The Available Parts


OK, so we have NMOS, PMOS, PNPs, with their collectors permanently tied to ground,
resistors, and really small capacitors, but is that all? There are other structures that you can
consider—NMOS devices that can operate at higher drain potentials by the addition of
complicated and bulky extra drawn features; the floating NPN, if you use a base mask; you
can also make lateral PNP devices, which are really just PMOS devices in a well that acts as a
base connection; or even lateral NPN devices, where the base (substrate) is permanently
grounded, and, basically, that’s it. The beauty is that many useful circuits are made from this
small assortment of easily understood devices.
Although this is only intended for the purpose of introduction, let’s look at the NMOS and
PMOS devices to develop an appreciation for their operation, which will hopefully inspire
some thoughts about possible applications.

The MOS transistor


The MOS transistor is a four-terminal device: drain, gate, source, and body. The substrate is
always ground in a P-substrate process, and in logic circuits, the source is usually tied to
ground for NMOS devices and the body terminal is always the substrate. For PMOS devices,
the body terminal is the NWELL, and in logic circuits the source is usually tied to supply,
along with the NWELL. In analog applications the source and drain terminals may both be
somewhat other than the body potential, which introduces the body effect into the otherwise
fairly simple gate voltage/drain current relationship. The effect is slight, but may be influential
in analog applications.
The MOS transistors encountered in CMOS processes are enhancement devices; that is, for
an NMOS transistor, a conductive channel is induced into the surface of the silicon
immediately under the gate, which bridges the gap between source and drain of the device
when the gate potential (Vg) is substantially positive with respect to the source terminal in the
case of NMOS, and negative with respect to the source in PMOS.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 16

When the gate is at the source potential, the device is essentially off. Respecting that source
and drain terminals are interchangeable, I will refer to the terminal with the lowest potential
(nearest to ground in the NMOS case, nearest to supply in the PMOS case) to be the source.
The threshold voltage (Vt) is the potential between the gate and source, which defines a
specific bias condition and basically three areas of device operation. Gate potentials below Vt
define a region of operation called weak inversion (also called the subthreshold region) and
gate potentials above Vt define either strong inversion (also called the saturation region),
when the drain potential (Vd) is high, or linear operation (also called the resistive region), if
the drain potential is relatively low.
The drain potential that delineates between saturation and linear operation is called the
saturation voltage, or Vdsat.

In the saturation region, when Vd exceeds Vdsat, the drain impedance is high; that is,
variations in drain potential have only a slight effect on drain current (Id). In the linear mode,
when Vd is less than Vdsat, the drain current varies linearly with Vd, and the device behaves
like a resistor. In saturation mode, the drain current is roughly proportional to the square of
Vdsat.
Subthreshold operation is quite useful for low power analog circuits, where the MOS device
acts very much like a bipolar transistor, but with the advantage of zero gate current. In this
area of operation, the Id/Vg curve is exponential, much like the bipolar device; however, as
bipolars show a decade of collector current increase for approximately every 60 mV of base
voltage increase, MOS devices increase drain current by a decade for approximately every 90
mV of gate voltage increase. This is called the subthreshold slope, and varies only slightly
from process to process.
I’ve just stated the characteristics of the MOSFET in very compact terms, so you may
want to reread the above few paragraphs until the concept is really clear to you. In fact, the
MOSFET is a fairly simple device, but because of these three regions of operation it is
actually more flexible than its bipolar cousin.
Due to the similarity between MOS devices operated in the subthreshold region and bipolar
devices, analog multipliers, bandgap references, and temperature measuring devices can be
constructed similarly to their bipolar counterparts. Later I’ll give examples, and you’ll see
how simple these circuits are.
Typical threshold potentials for 0.35 to 1 μ CMOS devices are in the range of 0.6 to 0.9 V.
The threshold voltage shows a negative temperature coefficient of about 2 mV per degree C,
similar to bipolar devices.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 17

The “strength” of a transistor, that is, how high a drain current can result from a given
gate voltage, is determined by the width/length ratio of the transistor gate. The gate length (L)
is the dimension in the direction of current flow between source and drain, and is the primary
dimension from which a given process gets its name; a 0.35-μ process normally means that the
minimum gate length is 0.35 μ. Shortening the gate length increases the current conducting
capability of the structure, makes circuits run faster as a result of increased current drive and
lowered gate capacitance, and is the key driving force in reducing geometry sizes for all IC
processes. The gate width (W) is the measurement of the gate material (over active area) in the
orthogonal direction; increasing gate width proportionally increases drain current, as though
multiple devices were placed in parallel.
Roughly, Id is proportional to (Vg−Vt)^2 × W/L in the saturation region. Clearly, when
Vg = Vt, a current still flows, indicating that these rules are approximate. The transition
among the three regions of operation are smooth and continuous in reality.
An NMOS device with L = 0.6 μ and W = 1 μ with the drain and gate both at +5 V will
conduct about 0.5 mA. With the drain at +5 V and the gate at threshold, Id will be perhaps 0.1
μA. The capacitance of the gate for 0.6 μ will be about 2.5 fF/μ2. This is substantially greater
than the poly-poly capacitance, making simple MOSFETS attractive in noncritical
applications, such as supply bypass devices, where the device is referred to as a MOSCAP.
PMOS devices behave similarly, but are somewhat less conductive, requiring two to three
times the W/L ratio to match their NMOS counterparts in terms of resulting Id. This is due to
the decreased “mobility of holes” (the current carrying mechanism in PMOS), which in
silicon is about one-third that of electrons (the current-carrying mechanism in NMOS). If you
need really strong current drivers, arrange your system such that the job can be done with
NMOS devices pulling down, not PMOS pulling up.

SPICE Modelling
One cannot design reliable circuits using the above information alone, but fortunately, a
SPICE simulation program (Simulation Program with Integrated Circuit Emphasis) can be
used to fully analyze drawn devices and accurately predict how the structures will behave.
Estimates of required device sizes can be derived from the above device character statements,
but hand calculation of all the possible variables is unreasonable, even for simple circuits.
Details of the process and expected device nonlinearities are well documented by every
foundry in the SPICE models, which will be supplied along with design rule information.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 18

Using SPICE to analyze your circuits will quickly and simply verify how close your initial
estimates were, and allow circuit modification until the expected results are achieved.
Typical details that SPICE will handle for you (which are mind-numbingly complex)
include:
■ The body effect
■ The effect of gate length on drain impedance
■ The effect of extremely short width devices
■ The effect of actual gate length and width on threshold voltage
■ The capacitance of all structures
■ The smooth transition between the three modes of operation
■ The effects of temperature on all parameters
■ Leakages in source/drain diodes
■ Resistivity of source/drain connections
■ The nonlinearities of all device parameters
Experience with the simulation of circuits will give you a better appreciation of these effects
and will ultimately improve your initial guesses. The SPICE simulator has wonderfully
improved the ability of engineers to work out problems such as these, which cannot be done
through experimentation alone. The time required to complete a design, submit the files for
prototyping, and analyze the results could be a cycle that is many months long. SPICE allows
reasonable confidence in your design for first-silicon success.
In fact, many engineers still use parts and solder in bench experiments, for which I strongly
suggest getting a SPICE package. Models are now being offered, reluctantly, by discrete
component manufacturers, and more problems can be solved in a day at the simulator than in a
week of bench experimentation (and it’s just as much, if not more, fun).
The foundry models will often be supplied in five flavors: typical, fast N/fast P, fast N/slow
P, slow N/fast P, and slow N/slow P; where the fast and slow are basically the current drive
capability of the devices, which can vary from lot to lot. Especially for high-speed logic
circuits, SPICE analysis using these models, and doing SPICE tests at different supply
voltages and temperatures will significantly improve your chances of the parts working the
first time.
You may, out of curiosity or necessity, look into a SPICE model and make occasional
modifications. When you do, you must be sure that what you’re affecting is well understood,
or false results will end up in a project disaster.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 19

Limitations
The IC processes available to the independent IC designer today were not available 20 years
ago. Back then, IC processes were scattered across the board, a mixture of bipolar, high
voltage, metal gate CMOS and some silicon gate CMOS with very long gate dimensions
(several microns). Companies that had invested in fabrication facilities only produced their
own product on these production lines, using their own closely held technology, which was
often quite different from a competitor’s technology. Today, the processes have matured to
the point where CMOS processes have become almost generic. This is not to say that bipolar
processes no longer exist, they are just offered by a very few small fabs, and are extremely
specific to each fab. This is also not to say that a CMOS process at 0.6 μ in one fab is identical
to a different fab’s 0.6 μ process, as I don’t know of any two processes running on different
fabs that are identical. What is important here is that the great utility and flexibility of CMOS,
in general, has allowed designers to think along common lines when designing circuitry, and a
design in 0.6 μ at one fab can be converted to another fab’s rules, usually without too much
difficulty. Finally, the realization that CMOS technology can be used so universally, for both
analog and digital applications, and that the cost of building a modern fab is well over a billion
dollars, caused many companies to open production capability to outside designers: The pure-
play foundry was developed to cater exclusively to outside companies.
As nothing is perfect in life, this is the hitch: the only toys that you find in high volumes in
the sandbox are those commonly used by others.
For all CMOS processes, the main driver is logic circuitry; any analog circuit considerations
are an afterthought at most fabs. Analog, as valuable as it may be to the SOC-ASIC designer,
is painful for most fabs to embrace. Running analog parts on a standard logic line gives the
process engineer heartburn. Analog designs may use transistors identical to those in digital
ones, but a logic circuit can suffer some leakage or some degraded performance without the
part failing test; analog circuits are much more sensitive to process variations. This is why,
invariably, a second poly layer is shoehorned into a single poly logic process so that it may
later be attractive to analog and mixed-signal designers; the first thought is logic circuits, only
later is analog considered. You will find processes that have nice double poly caps, but no
mask levels that allow high valued resistors. Go figure... Often, the substrate PNP (dedicated
collector) is not characterized, so some guessing or some expensive tests may be in order.
Fortunately, mixed-signal designs are becoming more common, as complete systems can now
be fully integrated to great competitive advantage, and fabs are getting the point.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 20

In almost all CMOS processes, the process supply voltage is fixed for the benefit of logic
circuits. From an analog design perspective, a signal range of 0 to 5 V (or less) seems
restrictive, especially when bipolar 12 or 15 V supplies are so familiar. Good things can be
done, however, in the 0 to 5 V range, or even much lower voltages, provided you can adjust
the requirements of the IC’s surrounding system accordingly. For this reason, it is valuable
for the engineer to not only have control over his custom IC design, but strong input to overall
product system design as well. You may well find that one person with a grasp of the entire
system can deliver better results than any committee ever could. There seems to be a universal
truth here, but I’ll let that go for now…
CMOS processes at the 0.6 μ and larger level run typically at 5 V. The fab spec may
indicate 7 or 8 V maximum, but don’t be tempted to push the supply to such limits;
transistors can degrade over time when stressed beyond the recommended supply potential,
and certain disastrous events like latch-up (more later) can occur far more readily. Find clever
ways to work within the recommended rules. Furthering your knowledge as to why the
foundry placed such limits on supply voltage will benefit you greatly when pushing the limits.
At 0.5 μ, circuits have difficulty running at a full 5 V, as the source and drain regions are
very close under a 0.5-μ gate. The drain of an NMOS diffusion will, when brought to a high
potential, act like a back-biased diode, and a depletion region will grow as a function of bias
into the substrate and toward the source junction. When processes are developed, a supply
voltage is chosen that allows as thin a TOX as possible, consistent with substrate doping levels
and the growth of junction depletion regions. Once these parameters are fixed, you can’t
make a higher voltage device by simply making the gate longer, further separating source and
drain, as the thin TOX will still be the limiting factor. Thick TOX masks, in combination with
gate lengthening may get your circuit to work at 12 V, but the headache of dealing with such
special structures may not be worth the trouble. My advice: redesign the system that surrounds
the IC to accept the limited signal voltage range.
On the subject of TOX, a 0.35-μ process may have a TOX of 7 nm (70 Å), roughly 30 to 40
atoms thick. At 5 V the electrical stress is on the order of 7 million V/cm, nearly to the point
where electrons can tunnel through the oxide and cause leakage, which quickly turns to
thermal stress, thermally induced leakage, and snap! Diffused junctions, provided they are not
too close to other junctions, can often sustain a higher voltage, acting like 8- to 12-V zener
diodes. Well diffusions though, on account of their lower doping level, can often take much
higher potentials, on the order of 40 V, and can be used cleverly as the drain regions of high-
voltage NMOS devices; but again, perhaps more trouble than it is worth.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 21

In order to pack as many transistors into as small a space as possible, and take advantage of
the relationship between size and speed, the metal layers used in CMOS processes are
extraordinarily thin, on the order of 0.5 μ. They are effectively a wisp of metal patterned onto
the surface of the wafer. A light brush with a fingernail can rip the traces off (actually
“smear” them, as they are so soft on this scale) when not covered by some protection. This
is so that the surface of the wafer, after deposition and patterning of the metal, is as smooth as
possible. After a layer of insulation is added, a flat surface is desirable for patterning VIAS, as
the photo imaging system has a very short depth of field. The topmost metal layer, however,
may be twice as thick as those beneath, often has chunkier rules, and can show lower
resistance and higher current-carrying capability.
In any case, the current-carrying capability of a CMOS metal trace is limited. When thermal
calculations are made, it turns out that metal traces subjected to overcurrent do not fail by
outright melting, but instead through a process called electromigration, a degradation process
that is time, current, and temperature dependent. The foundry will recommend maximum
continuous currents that any given metal width is rated to accept, and at different
temperatures. Usually, this value is in the order of 1 mA/μm of conductor width. This makes
the prospect of producing power ICs dim somewhat, as without special agreements with the
foundry to offer really thick metal (good luck), your maximum output currents may be limited
to the 100 to 300 mA range. Pulses, however, (as opposed to continuous currents) may be able
to reach 1 A safely, provided they are infrequent. You can build a transistor large enough to
conduct 10 A continuously, and put it in a fairly small space (0.1 mm2), but getting the current
in and out from VDD/GND pads to the output pad is impossible.
Lead inductance, particularly at the power and ground terminals, can cause significant
substrate noise when fast logic circuits are pulling extremely sharp and strong currents
repetitively from the supply. This situation can be somewhat relieved by the use of multiple
power and ground connections around the chip, but when this substrate noise begins to
interfere with sensitive analog circuits, especially ones that interface with the outside world,
differential techniques must be used. The addition of supply bypassing by the use of the gate
capacitance of MOS transistors across the chip supply rails can help, as can the use of
packages with very short leadframe elements; however, the substrate noise problem can
present a significant limitation to final product performance.
The discussion about limitations cannot be concluded without mention of programmable
memory, something we’ve all become used to and, sadly, remains out of the reach of the
generic CMOS processes you are

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 22

likely to find offered. The problem is not just the large number of masks involved (maybe 22),
but the detailed control of process parameters in production. For a given part that has very
high sales potential, a fab can possibly be persuaded to take on the task, but as an option that
can be simply “tossed in” to a custom design; the industry just isn’t there yet. There are
EEROM designs that are extremely bulky, where a few bits (maybe several hundred) can be
reliably built in, but this usually requires an intellectual property (IP) agreement with the
foundry, and I’ve found that the difficulty, cost, and general hassle just isn’t worth it. Even
if a foundry offers an EEROM solution, chances are it won’t be found elsewhere, and you’
re stuck to a given fab for production. Sorry, maybe next decade. However, cheap EEROMs
that can interface with your design through a 2-wire connection are commercially available,
and your part can be designed to boot code from that external source.

The Good Part


For digital circuits, the design tools you use will allow the design and use of standard cells,
drawn objects that can be created once, and used over and over anywhere in your design. This
is a lot like designing your own set of logic parts, generating data sheets for them, and using
them in any way you wish; by the way, they’ll be the fastest logic parts you’ve ever
worked with, easily by an order of magnitude. Once you are familiar with the layout tools and
the process rules, a simple standard cell library can be constructed and characterized in a few
days. The amazing part is that a function like, say, a flip-flop, has a production cost of about
0.002 cents. A 2-in. NAND gate has a fabricated cost of about 0.0004 cents. This is for a 0.35-
μ process; the costs go down on a per cell basis as the process becomes more advanced (0.18
μ, 0.13 μ), but the mask costs become frightfully high.
A complete, fully usable set of standard cells may number as few as 20—basically,
functions like INV, NAND, NOR, XOR, MUX, FLIP-FLOP, ADD, and so forth. If you
choose, you can expand your standard cell library by developing more complex functions like
AND, OR, and decoders, but these can be built from the simpler gates. High-level logic
functions, like a multiplier can be built from the standard cells and defined as cells themselves
that can be used over and over within a design, much like the smaller standard cells. A typical
16-bit by 16-bit multiplier with a 32-bit output in 0.35 μ CMOS will have a worst case
propagation delay of about 20 ns and have a fabricated cost of about one-half of a cent.
Of course, no fab will offer a flip-flop for 0.002 cents, but when tens of thousands of such
cells are designed into a chip, ignoring one-time mask costs and the cost of your design time,
this is roughly what the per cell cost of production fabrication works out to be. Jaw dropping,
eh?

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 23

Memories can be drawn as a single bit cell, and then that cell can be arrayed into blocks that
have addressing and I/O cells attached to the sides. You don’t draw a whole memory array,
you just draw one cell and have the tools array it for you. A digital circuit has relatively few
drawn objects, placed into position and wired together, either by an autorouting tool or, at the
uppermost levels, by hand. Frequently, the area required for a given digital function will
depend largely on the size of the memories that are required, and a few trial layouts and some
simple calculations can determine the rough die size that results. This, of course will indicate
approximately how much the die will cost, give an idea of yield (a function of die size and
circuit density), and allow an initial choice of package dimensions.
Some foundries offer predrawn standard cell libraries that can be used, although they may
require some modification and tweaking to suit your application. When it comes to memories,
you may not find what you need for free from the fab, and you may either contract with the
fab to purchase memory designs, or embark on producing them yourself. Memory design is
quite simple though, and, when under your control, can give you exactly what you need. I
suggest designing both your own standard cells as well as all of your memories.
When you do your own designs, in a way that is reasonable in terms of initial cost, you may
find the ability to include dense memories to be disappointing. We are familiar with high-
density memories as commercial parts, but they are fabricated on extremely advanced
production lines using rules that apply specifically to that kind of memory. The general
purpose CMOS processes available to you as a fabless enterprise have rules that allow a wide
variety of structures to be built, and are not intended for commercially viable high-density
memories. Although you can buy a 256-Mb DRAM as a stand-alone part, you probably won’
t be able to include one larger than, perhaps, 4 Mb in 0.35-μ CMOS, and perhaps only 1 Mb of
SRAM, and even that would be on a fairly large die with questionable yield. If you need really
large amounts of RAM, bring out pins from your design to interconnect to a cheap,
commercially available one. Alternatively, many designs, when thought through carefully
from a system point of view, can minimize their need for memory. You can, however, design
very fast memories on-chip that can cycle at high rates while drawing very little power. You
will find that the driving of pin capacitances to interface with an external memory will not
only significantly increase system power, but slow the data exchange process as well. Small,
high-speed memories interfaced with logic within the chip can result in very fast DSP
operations while drastically lowering power consumption.
From a digital point of view, a major advantage to putting everything onto a single chip is
that the interconnect capacitances are much smaller

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 24

than those encountered when interfacing several ICs on a PCB. The resulting lowered dynamic
power consumption allows for whole new markets to be addressed; smaller, lighter, cheaper,
and, in this particular case, the possibility of battery- or solar-powered operation. Standard
ICs, for example, microcomputers, are made for general purpose applications, and seldom do
exactly (and only) what you want in the way you want it done; these are general purpose parts,
designed to appeal to a wide range of applications. It’s the application specific part of the
term ASIC that allows you to gain significant competitive advantage.
The analog side of custom chip development is where product system integration and the
full value of SOC designs really begin to shine.
CMOS amplifiers are very easy to design, and they can be built in all shapes and sizes,
speeds, and drive capabilities. Rarely will the amplifier in an SOC be like the general purpose
ones used at the PCB level, nor would you want them to be. A simple amplifier can be made
from seven transistors, occupy about as much space as a flip-flop, draw as little as a few
nanoamps from the supply, and be used as a comparator, if desired. It will however, have very
high output impedance, and drive loads poorly. The designer will find ways to accept this fact
and arrange the circuit so that large output currents are not required. Depending on speed
requirements and load resistance, an amplifier can be quickly designed to tackle just about any
application. Input offsets can be controlled by layout and the sizing of the devices, and a
statistical distribution of the resulting input offsets can be quickly estimated. Once the simplest
amplifier is understood, others, involving additional devices, can be designed to deliver high
output currents—fast response or extremely high gain.
The input resistance of such amplifiers is of course infinite, as the DC gate current during
normal operation is zero. This allows the design of switched capacitor circuits, which can
provide a vast array of possible functions, even when the on-chip capacitors used are small, on
the order of tenths of a picofarad. Techniques have been developed to minimize the effects of
stray capacitance and amplifier stability in such circuits, and will be elaborated in later
chapters.
High-output drive capability (within the limits of metal migration) is easily achieved, as
output currents in the range of hundreds of milliamps are delivered by modest-sized MOS
devices. Class AB speaker drivers, on the order of a few hundred milliwatts, can be designed
to differentially drive output terminals between supply and ground. In this way, 500 mW can
be driven into an 8-Ω load from a 3.3-V supply, although consideration must be given to metal
migration issues and adequate metal runs must be used. Low quiescent current consumption
can be traded off against distortion.
The above speaker driver could get hot under abusive conditions (like a temporary output
short), and some measure of over-temperature

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 25

protection is advisable. This is no problem, since MOS and bipolar devices have well-
established thermal characteristics, a circuit can be designed to produce a shutdown signal to
the amplifier when the die temperature exceeds a certain predetermined threshold. By the way,
single crystal silicon has a thermal conductivity that is quite high, greater than most metals,
about one-third that of copper, 5 times that of high density alumina, and 1000 times that of
most plastics. As a result, the average temperature across the die will be quite uniform: A 1-
cm cube of silicon can pass 1.5 W through opposing faces with only a single degree Celsius of
temperature drop. Therefore, excessive power dissipation in one area of the design will easily
affect the whole die, allowing a thermal detector to be placed anywhere and get a reasonably
good measurement. Designing such a temperature sensor is quite easy, especially once you’
ve understood the bandgap reference.
Bandgap references traditionally use the voltage/current/temperature characteristics of
bipolar devices to provide a reference potential that is reasonably independent of temperature.
Although the dedicated collector PNP device can be used for this purpose, MOSFETS
operated in subthreshold mode can behave very similarly to their bipolar counterparts. The
bandgap reference can be structured in any number of ways, and once you understand the
basic idea, not only accurate and thermally stable voltage references are possible, but
thermometers too.
Oscillators can be made in every imaginable way. A typical ring oscillator (an odd number
of inverters in a loop) can run at 2 GHz while drawing under 1 mA in 0.35-μ CMOS. A simple
triangle wave generator can be built with an amplifier, a comparator, an on-chip capacitor, and
a couple of current sources to produce an output period in the order of seconds. Crystal
oscillators can be easily built with associated capacitors built on-chip for added system
economy.
Voltage-controlled oscillators (VCO) can be put into phase-locked loops (PLL) to take the
output of extremely cost-effective crystals (a 32,768 Hz watch crystal costs about 10 cents in
volume) and produce any desired clock frequency for internal digital processing use. The
watch crystal oscillator draws microwatts of power, but you don’t have to use a cheap
crystal, although you can, and you will want a PLL in any case if your processor runs at 30
MHz, or above. In fact, the PLL is very simple; the VCO can be a simple ring of inverters, the
phase comparator being just a pair of flip-flops and some gates, and the loop filter being an
on-chip capacitor with a few simple switched current sources. There are details to consider for
sure, but issues that cause problems will be detailed later.
Probably, the most powerful analog function in an SOC design is A/D or D/A conversion,
getting real-world signals into a device, quantifying them, and processing them for output.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 26

DACs can be simple proportional duty cycle (PWM) logic outputs for post filtering off-
chip, the derivation for which may be entirely logical in nature, or more refined converters can
be developed with analog techniques. The availability of on-chip resistors with reasonable
matching characteristics allow R-2R ladder DACs to be built, which can easily achieve 8-bit
accuracy, and with careful layout, 10- and 12-bit versions are quite possible. Delta sigma
techniques can allow filtered outputs to achieve far greater resolution, often (but not
necessarily) using switched capacitor techniques. The jump to delta sigma conversion may
require more understanding and work than you’re willing to accept at first, but these
techniques are unparalleled in performance for certain applications.
The ADC function holds perhaps more possibilities for implementation, which is fortunate,
as more systems require analog inputs than analog outputs. The R-2R ladder DAC can be used
with a simple successive approximation register (SAR) and a comparator to quantify sampled
signals, but simple delta-sigma techniques, ramp converters, and high-speed, low-accuracy
flash converters are all easily built.
The converter scheme you choose has everything to do with the characteristics of the signal
you are trying to quantify; it’s impedance as it enters the IC, the required bandwidth that
must be captured, the accuracy required, and the availability of references either on-chip or
off. The ADC structure you choose can be made to order for your particular situation.
Silicon is naturally sensitive to light; although appearing gray upon casual observation, it
becomes transparent in the infrared, cutting off at a wavelength of about 1 μm, perhaps a half
octave past what we would call deep red. At shorter wavelengths, on the order of blue and
green, light is absorbed quickly at the wafer’s surface. Red light penetrates more deeply. The
absorption of light on the silicon surface results in measurable currents, provided junctions are
applied through which the currents may be collected. The N diffusion is a bit more sensitive to
shorter wavelengths, where the NWELL is more sensitive to red on account of its depth. These
photodiode structures can be quite efficient as light sensors. In full sunlight, the photocurrent
may be as great as a few hundred picoamps per square micron, and the signal from even very
tiny wells or diffusions can be amplified to produce useful signal outputs. Speed is
unfortunately not as great as would be afforded by PIN junctions, which are not available in a
general purpose CMOS process. Nonetheless, the well junction has a fairly low capacitance to
substrate, on the order of 100 aF/μ2, and reasonably fast photoreceivers can be built on-chip.
Diffusions, which have a significantly greater capacitance to substrate, and well junctions are
both sensitive to red LED light. Several foundries have antireflection coatings available, if
required.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 27

The CMOS camera can be fabricated on a standard CMOS process, but if color is desired,
special color filter dyes must be selectively applied. Cameras are simply arrays of
photosensors, each containing its own amplifier and control circuitry (a few transistors), that
are addressed like a memory, and arranged so that analog values of pixel exposure can be read
out. Pixels can be 10 μ by 10 μ or smaller, allowing low-resolution arrays to be extremely
compact. Numerous papers (and patents) exist on these devices.
Radio frequency (RF) was at one time a poorly understood art practiced by a very few
engineers, but as development tools and our understanding have improved, so has our ability
to see RF systems as analog systems, simply operating at very high frequencies. RF is
fundamental to our highly communicated world today, and, fortunately, many RF projects can
be fabricated on the same CMOS processes that are widely available. Unfortunately, the
inductors that can be drawn on-chip have poor Q characteristics, and the variable capacitance
diode (varicap) structures are poorly characterized for a given process. All PN junctions show
capacitance variation with applied reverse bias, but series resistance effects are difficult to
predict and the tuning range using available structures is quite limited. Further, the effects of
stray inductances and capacitances are difficult to accurately analyze and simulate. Therefore,
many RF circuits may require experimentation to get them right. The NMOS transistor,
however, can operate at very high frequencies (several GHz) with reasonable noise figures,
provided issues like gate series resistance and substrate/well/metal conductivity issues are
properly considered.
Mixers and IF amplifiers can all be fabricated completely on-chip, with radio transmitters
and receivers beyond 5 GHz being practical in silicon. Gallium arsenide (GaAs) is chosen due
to its high mobility for low noise and very wide bandwidth devices, but much can be done on
silicon, with the standard CMOS process. Once the received information is converted down to
lower frequencies, more standard analog and digital techniques can be used. Special circuits
may need to be developed for local oscillator control; for instance, the standard cell flip-flop
may not work well in predivider applications at very high frequencies, but specialized circuits
and techniques can be developed for this purpose.
Since silicon is a semiconductor, hall devices can be integrated to sense magnetic fields,
although the sensitivity of these devices is limited. Further, the resistance of diffusions and
wells are somewhat strain-sensitive, opening up the possibility of strain sensors on chips.
Left out of this excitement about “all of the things that can be done in CMOS” is
probably the obvious, like direct LCD driving and others, too numerous to mention.
My point is that so much can be done. The sandbox is wide, fun, and waiting.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE SANDBOX Keith Barr 28

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
Source : ASIC Design in the Silicon Sandbox Keith Barr 29

Fabs and Processes


A wafer fab is a very serious place where workers in synthetic bunny suits shuttle wafers
from one high-tech machine to another, all the while working in perhaps one of the most
potentially dangerous environments industry has to offer. Imagine: above you is an entire
floor dedicated to fans and filters that take air from beneath the perforated floor upon
which you’re standing, filter out the most minute particles, and blow the air down upon
your head. The machines you’re working with can heat wafers to 1200°C, bombard
them with the ions of elements like boron, phosphorous, and arsenic (from poisonous
gases like diborane, phosphine, and arsine) at potentials in excess of a million volts
(generating lethal x-rays), etch away surface layers with materials like sulfuric acid and
hydrogen peroxide, grow silicon onto surfaces with silane gas or insulation using
tetraethoxysilane, and, in general, everything around you is either corrosive, explosive,
way too hot to touch, or lethally poisonous.
…What fun!
Even the tiniest piece of dirt can ruin an IC, so without the air filters the yield from a
wafer would be zero. Further, certain contaminations must be strictly avoided; it’s been
said that if someone were to walk into the clean room with a handful of salt, the fab
would need to shut down for a thorough cleaning. Alkali metals wreak havoc in a
semiconductor process.
Some fabs have stricter rules than others; I was once touring a fab when a worker slid
open the door to the clean room and pushed a cart of wafers right across a carpeted floor
and through the sliding door of another clean room. I think he muttered “excuse me.”
So that’s why we were getting poor yield…. That fab is now under different
management.
Fabs run 24 hours a day and shut down once a year for refurbishment.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
FABS AND PROCESSES Keith Barr 30

Different Fabs, Different Missions


The field could be divided into any number of camps, but I see there being three basic
classes of IC fabs. By the term “class” I do not refer to the level of air purity, as in a
“class 10 fab environment,” but instead to the level of economic scale and
technological capability.
Class 1 fabs are constantly driving toward finer geometries. As of this writing, the
“45-nm node” is the hot topic, which is curious since most customers are just getting
comfortable with the 0.13-μ process, and are having difficulty shrinking their designs
down to 0.09 μ (90 nm). These fabs are the most expensive in the world, easily costing
over $3 billion for a facility that can produce thirty thousand 8-in. wafers a month. It is
said that their net profit is equal to roughly 50% of sales, which makes me wonder how
they figure in equipment amortization, especially when the equipment is virtually
obsolete the moment they install it. The game at the class 1 level is money, with stock
offerings and great boasting about future technology driving investors to fund the next
level. I’ve known such fabs to sell advance capacity to customers just to obtain the
funds to start construction.
Class 1 fabs cater to companies that design parts for the largest consumer markets—
computers, cell phones, and entertainment equipment, and also for large FPGA vendors.
When times are tough, wafer prices go down dramatically, profits plummet, and their
lines shift over to making SRAM or DRAM, which they sell to the generic memory
market. They are not interested in small projects, and will not even talk with you if you’
re not approaching Fortune 500 status. I do not recommend attempting to work with
them, although if you have to, you may find a broker agency that can help. During hard
times you may find that these fabs will offer you space on their line, but be prepared to
get kicked out once the business from larger customers picks up.
All fabs want to know that you’re going to be successful with your project. Class 1
fabs in particular are impressed by significant venture capitalization and industry
heavyweights on your board of directors. Patents help too. They are looking for projects
that will run over 500 wafers per month, a volume that could exceed the capacity of a
small fab. They want to know if you already have a customer, or whether you’re in
“build-it-and-they-will-come” mode. The former is good, the latter is bad.
These fabs produce 8-in. and 12-in. wafers almost exclusively. The organizations were
developed as pure-play fabs (meaning they only process wafers for outside customers)
almost from inception, and are the main driving force behind the fabless semiconductor
movement. They largely began by offering standard parts and perhaps gate array
products, until it became clear that no independent designer wanted to deliver his design
to a group of other designers with their own agenda.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
FABS AND PROCESSES Keith Barr 31

Now, the major class 1 fabs make the point that they are “purely” pure-play. Their attitude
is extremely aggressive, expecting to do huge volumes of business in markets that change by
the day.
Fabless companies that buy from class 1 fabs may dominate a market now, but could easily
lose on the next version of products (product cycles may be measured in months), so the class
1 fab is usually catering to a very fast-paced, high stakes, win-or-lose game. The IC design
tools used in these projects can cost millions of dollars to acquire. A single mask set in 0.13 μ
can exceed $800,000. Many software companies, with frightfully expensive tools are
competing for this market (an autorouter that you rent maybe $500,000 a year). I’ve found
designers at this level to be extremely cocky and they all look down on class 2 and class 3
fabs, or anyone who isn’t designing in the very latest process technology. One such designer
once told me that “if you have an idea for an IC, then someone else has already done it.”
His point was that the only way to win is to fab on the smallest geometry process possible, and
as quickly as possible. However, I believe there are all kinds of things that can be done, even
at 0.5 μ there are things that no one has done before, and that one can profit handsomely from
the effort. In fact, he’s fighting a very serious battle involving large sums of cash; truly on
the “bleeding edge.”
Class 2 fabs generally work with mature processes, sticking with the more tried and true,
and pushing down to 0.25 μ at the finest. They do offer very well characterized processes and
many such fabs are quite approachable. Class 2 fabs are in “sandbox” territory, and, I
believe, offer a great opportunity for companies to jump into this next level of engineering.
Class 1 fabs are very digital oriented, while class 2 fabs lean more toward mixed signal
processes.
Class 2 fabs are also easier to deal with, as much of what they’re fabricating is older
designs for which no serious advantage can be obtained by “shrinking” the design to a
newer process. Although DVD or cell phone chipsets may change every six months, some
products can remain in the marketplace for years, even decades. Class 2 fabs cater to a
different kind of customer, who runs at a different pace. I’ve found that they are much more
helpful and flexible than their class 1 counterparts if you’re from a small company. The
original idea of the pure-play foundry was to cater to independent engineering forces that
needed wafers fabbed, just that simple. The class 1 fabs, however, quickly became so
successful, driving the technology to win the investment they needed to grow (through intense
competition), and the economics of their success seduced large established companies to
abandon their in-house fab lines in favor of the fabless model. I believe this actually came as a
surprise to many class 1 fabs, as their origin was making simple ICs for such humble things as
toys and watches.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
FABS AND PROCESSES Keith Barr 32

Today, many of those abandoned in-house fabs are class 2 fabs, under new management,
that continue the business of supporting small, growing, fabless semiconductor companies.
Class 3 fabs offer some CMOS and some bipolar—usually processes developed long ago
for some proprietary project. They are very small, flexible, and approachable, but often aren’
t equipped for modern processes. I consider them to be outside the sandbox, somewhere in the
grass. This is not to say that they cannot fab usable products; in fact, this might be the first
place to start if your dream involves power ICs. One reason I consider class 3 fabs to be
outside the sandbox is that their processes are not often in alignment with the general CMOS
trend. You may be able to get great bipolar devices (with many confusing mask layers) to do
useful things, but porting the design over to another fab will be nearly impossible. When you
design in a standardized CMOS process, a very similar process is likely to be available
through other fabs, so that in an emergency your production can continue.

Prototyping Services
Most fabs offer some kind of prototyping service, the least expensive being the multiproject
wafer (MPW) run, often called a shuttle run, where your design and designs from others are
collected and arranged onto a single mask set. Of course, since the maximum imaging area is
limited to approximately 20 mm on a side, the projects must all be fairly small to fit several
different projects into this area. The cost of the mask set can, therefore, be shared between the
participants. After the resulting wafer has been fully processed, it is diced into individual die,
sorted (as to who gets what), and the die are distributed back to the participants (often in
packages). Usually, MPW allows from 5 to 50 parts that can be tested. The cost for this level
of prototyping is very attractive, on the order of $5000 to $60,000, depending on the process,
but the limited number of die and the somewhat longer turnaround time for collecting and
distributing all the designs make the MPW process less attractive for those on a tight
development schedule.
Alternatively, some fabs offer a multilayer mask (MLM) process where four mask layers
can be patterned onto a single mask blank. This requires that your design be sized such that
four layers can fit on a blank; usually designs over 10 mm on a side are too large. The imager
will position the correct area of each mask during each exposure, and essentially do four times
as many (half-sized) steps and repeats across the wafer. This is more time consuming for the
stepper, so production is not encouraged with an MLM mask set, but the mask cost is much
more reasonable than a full mask set, and several wafers can result. This allows for enough
parts to warrant immediate test marketing of the

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
FABS AND PROCESSES Keith Barr 33

product, provided the design works. If the design shows a flaw, the project may be set back a
bit, but the cost for the test will be limited. The turnaround time for an MLM run is usually
shorter than that of the MPW run, as only a single customer is involved.
MPW prototypes can be obtained through the fabs directly, or through one of two
prototyping services: MOSIS in the United States and Europractice in Europe.

Mosis
MOSIS is an organization run by the University of Southern California (USC) as a service to
students, business, and government institutions. It began as a government-sponsored group
through the Defense Advanced Research Projects Agency (DARPA), and is now a nonprofit
organization administrated by USC. MOSIS began in 1981, and has since been an excellent
resource for engineers getting into IC design for the first time. They offer many processes
through the MPW method, and may be able to procure wafers (at an increased cost) from fabs
that are otherwise unapproachable (that is, until your product hits 200 wafers a month.). The
MPW runs that are available through MOSIS range from 1.5 μ (AMI) to 0.13 μ (IBM).
The best deal ever for a newcomer in the IC field is the MOSIS “tiny-chip” in an AMI
1.5-μ process. This process includes the base layer, so floating NPNs are available, as well as
double poly caps, but sadly, high-value resistors are not. The deal is that you draw your circuit
within a 2.2 mm by 2.2 mm space, they have the parts fabbed, and send back five pieces for a
cost of $1130 (at last check). Packaging is extra, but reasonable, on the order of $40 per part.
This allows, for perhaps twice the cost of a multilayer PCB prototype, parts that can
demonstrate to you that yes, you really can make neat stuff in silicon. I’ve done a few
designs in the AMI 1.5-μ (ABN) process, and despite its disappointing performance with
digital circuits (large and slow), I’ve built analog systems with dozens of op-amps and
analog multipliers on chip (analog music synthesizer), and also a very small RF receiver
running at 225 MHz. Just in case you think 1.5 μ is too slow for RF circuits, get this: My first
receiver prototype was showing a spurious oscillation at 2 GHz, simply because I neglected to
fully model lead inductance in my SPICE simulation. Thankfully, the prototyping is cheap,
and the problem is easily corrected.
MOSIS is probably the most important resource for the new IC designer. When I started
out, I fabbed several projects in the ABN tiny-chip format, so I could test oscillators, op amps,
stress sensors, hall sensors, bandgap references, optical devices, and so on. You can get the
parts packaged (by MOSIS) into 40 pin DIPs that, although awful for RF

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
FABS AND PROCESSES Keith Barr 34

designs (lead inductance), allow easy bench testing of your circuits using old-fashioned
breadboard methods. The 40 pin package allows 10 pads along each edge of the 2.2 mm by 2.2
mm die, and you’ll probably have a hard time using them all on your test structures; 40 pins
allow for the evaluation of more crazy ideas than you can probably imagine.
At the other end of the spectrum, MOSIS offers the IBM 0.13-μ process for a whopping
$57,500 (at last check), provided your design is less than 10 mm2, but you get 40 parts back in
the deal. Wait a second here… that’s pretty expensive, no? Point is, unless you have a
serious budget, stick to the older processes (a rhythm I will beat relentlessly here….).

Europractice
Europractice offers MPW runs from AMI, UMC, and Austria Microsystems, and although
their pricing is very reasonable in the 0.35- to 1-μ range, they do not offer anything as
wonderful as the tiny chip deal.
MOSIS attempts to keep their costs down by automating their ordering process as much as
possible, which can be frustrating. They discourage customers interfacing directly, preferring
that their automated web forms be used for this purpose. This can cause problems, as anyone
new to the process has many pages to plow through before it becomes clear as to what is
required next. For me, it’s never really clear, but I have coaxed them to help over the phone.
They’re really nice people; they’re just way too busy. In fact, MOSIS is only a few miles
from my home, and despite my expressed interest in visiting, I really doubt anyone gets
invited.
The Europractice group is a bit different. The interface can be more personal, and project
coordination can be done over the phone and by e-mail in many cases. The AMI processes
they offer are from a fab in Europe that AMI acquired, so they are a bit different from the AMI
processes that MOSIS offers. Europractice offers Austria Microsystems processes, as does
MOSIS. The MPW pricing is similar: small projects ranging from a few thousand Euros for
the older processes, to beyond €60,000 for more advanced 0.13-μ technology.

The High Cost of High Technology


These prototyping costs directly reflect the mask costs of these processes. As minimum
geometry dimensions shrink, mask costs increase dramatically. Surprisingly, production wafer
costs are more a function of the number of layers used, not so much the fineness of the
process. Although the cost of the imagers used in fine-line processes are higher, most
operations involve implantation, deposition of insulation and metal, and etching and baking
for long periods in high-temperature ovens, and

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
FABS AND PROCESSES Keith Barr 35

these processes don’t change much when going from one process to another. The more
advanced processes do have more metal layers (as many as nine conductor layers) which
obviously increases the number of masks, while processes like 0.35 μ typically employ three
metal layers with the option of a fourth. Still, although the number of masks for a nine metal
0.09-μ project may be double that of a three metal 0.35 μ one, the cost of just one single mask
in 90 nm may cost more than an entire mask set in 0.35 μ.
The number of devices that can be fit in a given space roughly increases as the square of the
geometry size reduction, while speed is improved and dynamic power consumption is
lowered. The initial cost of masks, however, makes the use of fine-line processes prohibitive
for low production projects. Unless you really need very high speed and low power, it may be
better to make your design in 0.5 or 0.35 μ, where the mask costs are reasonable, as opposed
to 0.13 μ where the resulting die may be tiny but the mask charges exceed any possible
production saving. At a minimum, you can get into a market with a lower entry cost at 0.35 μ,
and then once the market is proven and making money, funds will be available to scale your
design for higher production economy.
As attractive as 90-nm silicon may be, or even 0.13 μ, many issues exist with these
processes that cause huge problems with design. The operating voltage at the 90-nm level is
1.0 V, and 1.2 V at 0.13 μ—a result of the extremely thin TOX that is required for fast
operation. To adjust for the lowered logic signal swing, the gate thresholds must be reduced so
far that when the gate is brought to source potential, the device is still solidly in the
subthreshold region; the devices never fully turn off, and power consumption due to leakage
can exceed the dynamic power required for the logic function. The subthreshold slope that
controls how completely the device turns off can’t be affected by a process variable, always
being stuck in the 85 to 95 mV per decade range. The whole chip may only measure 5 mm on
a side and cost only $3 to produce, but the masks cost over a million dollars and you may not
be able to run the project from small batteries.
Besides being expensive to tool and quite power consumptive, the actual design of circuits
at 0.13 μ and 90 nm poses certain problems not encountered at 0.18 μ and above. These
processes scale “horizontally” across the wafer, but not as much “vertically.” As the
devices become smaller, the specific conductance of the process increases, that is, the current-
carrying capability of devices on a given area of silicon increases. Considering that the supply
voltage lowers with scaling, the impedance of signals on-chip decreases significantly, but the
conductors that interconnect the devices have increased resistance due to their finer conductor
size. At these levels it is necessary to analyze the resistance of

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
FABS AND PROCESSES Keith Barr 36

metal runs that may slow signal propagation. Further, the capacitance between adjacent
conductors becomes significant, as the thickness of the metal is greater than the spacing
between conductors. The tools used to verify layouts at these levels are advanced expensive
software packages that take trace resistance and adjacent capacitance into account. When
you’re paying over a million dollars for a mask set, and are facing the prospect of paying for
a second one if first silicon fails, paying millions of dollars for development tools that can
properly verify your design is a good deal.

Practical Sandbox Technologies


Half of all CMOS wafers produced today are in 0.25 μ and above, but most of the news about
semiconductors is about how the boundaries of 65 nm are being pushed. This is to some
degree sensational, as we all love to read about technological progress. Advertisements for IC
development tools will boast about capability at the 65-nm node, giving one the impression
that older processes are no longer valid. In fact, you only see such advertisements because
that’s where the money is, and the money is big. Designers who need the speed and density
at 65 nm also are facing $3 million mask charges, and pay dearly for development tools. If you
don’t need such speed and density and, more to the point, if you can’t afford such tools,
then you’re invited into the >0.25-μ sandbox, where you’ll have a lot of company.
The next chapter will attempt to put the economics of IC design and production into
perspective.

Mature process variations


There are significant differences between processes that require understanding before the
“right” process for your project can be chosen. Particularly with the more mature >0.5-μ
processes, the issue of planarization affects your layout style and can impact the density of the
resulting circuitry.
Figure 2.1 shows cross sections of three possible IC construction variations to illustrate the
advancements that are typical of the larger geometry processes.
In Figure 2.1A, FOX insulates the substrate from POLY, and POLY is insulated from
metal1 (M1) by the first insulation layer, composed of boro-phospho-silicate glass (BPSG).
When heated, BPSG will “flow” to round-off sharp corners, making a somewhat smoother
surface upon which to pattern M1. The variation in overall profile that results causes
difficulties when connecting M2 to M1. Sharp steps in the overall profile make photo imaging
and etching difficult in certain cases, for

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
FABS AND PROCESSES Keith Barr 37

Figure 2.1 Variations of surface planarity that limit feature locations.

instance, the coincidence of M1 and POLY. This is an older process, typical of 1 μ and above.
In Figure 2.1B, the second insulation layer has been planarized to allow a smoother surface
upon which to pattern M2, but vias must be drawn at a distance from contacts, as the surface
of M1 is not smooth in the contact areas. Most 0.6-μ processes have this level of planarization,
and concerns about the location of M2 features become unimportant. Vias must be drawn
some distance from contacts though, which can complicate design and make your designs less
“tight.”
In Figure 2.1C, the BPSG layer is thicker and planarized as well, which causes a deeper
distance through which M1 must pass to make contact with the silicon. In this case, the
contacts are “plugged” in a process where metal is isotropically deposited in all directions
and then anisotropically etched away from only the top direction, leaving a flat surface with
metal filling the contact openings, upon which to deposit and pattern M1. Subsequent layers
are planarized and plugged so that the resulting structures are flat and more easily photo
engraved. This allows the stacked via, a feature found in a few 0.6-μ processes and in

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
FABS AND PROCESSES Keith Barr 38

most 0.35-μ processes. The ability to place vias anywhere without regard to POLY or contact
locations makes layout much easier and quicker.
As processes become more planarized, especially in the case of plugged contacts and vias,
the contacts and vias that you draw must be of an exact size. This is so that the isotropic filling
of the holes in insulation will completely fill during deposition. Older processes that are not
plugged will allow large vias, or stripes of contact, but newer processes will demand exact-
sized contacts and vias, even in bonding pads and in the peripheral areas, which is a pain, but
you learn to put up with it. At least you now know why. It’s often difficult to find a person at
the fab who can answer the “why” question; due to compexity, they tend to simply follow
orders.
Processes are often abbreviated as: 0.6 μ 2M2P (2 metal, 2 poly), or 0.35 μ 3M1P (3 metal,
1 poly). Some processes are so multifunctional that the layers list can be daunting. Base layers,
high voltage masks, special implants, multiple poly, extra metal layers, extra wells of different
doping for high voltage use, and so forth; therefore, each process must be carefully evaluated
prior to making any decisions about its suitability for your use. Further, many of the special
layers can be ignored if you are not using them, and you can extract, from the fab’s layer set,
a more simplified one that only applies to your purposes.
I’ve found that in many cases, the ability to fabricate compact resistors of reasonable
value (100K) is necessary in low power analog designs, so you may place the availability of
high-resistance poly layers (100K or so per square) high on your shopping list. This also
applies to double poly caps and maybe the degree of planarization. A fab representative can
usually give the features of a particular process over the phone, so that you don’t have to go
through the process of signing NDAs and laboring over a new process specification to get a
good idea of who offers what. You don’t want to get seduced into a process that looks great
and begin your design only to find that the 4 MΩ of combined resistors you need for a low
power ladder DAC takes up a square millimeter. Further, that the poly resistor has a
capacitance to substrate that will slow settling, a characteristic which is proportional to resistor
area.
Different fabs have different rules on metal width, in particular how wide a metal strip can
be. This isn’t a deal breaker, but it can cause significant trouble in making very wide
connections to carry high currents. The fab will say that large metal areas cause stress that will
affect reliability, and will often insist that no metal be wider than, say, 30 μ, and when using
such wide metal strips, that they have large separations from adjacent strips. The design rule
checking for these geometries is sometimes difficult, and the aggravation of constraining your
layout to meet these rules can be painful. Bonding pads can be huge, so why not other metal?
Sorry! There’s the “why” question again…

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
FABS AND PROCESSES Keith Barr 39

Other fabs, however, ones that have actually done testing, may allow unlimited metal width.
I’ve fabbed projects in such processes, and believe me, it’s a breath of fresh air. You don’
t know if the fab’s process actually causes dangerous stresses or not, and quite likely they
don’t either. Many processes today are “borrowed” from another company, and the fears
that were racing through the minds of the original process engineers were carried into
competitor’s processes without either understanding or question. You’ll see several
instances of unexplained nonsense in different fab’s rules, but the fab won’t go to the
trouble of actually identifying the reason for the rule; they find it much more convenient to
stick with the established rules, and count on you to abide by them. If you’re daring,
however, you can usually do whatever you like, as long as you twist some arms at the fab and
agree to accept the silicon as it’s fabbed.
Concerning metal rules, another one that can creep up on you from behind is the area
density rule that seems to become more important as process technology gets finer. So that the
etching process is uniform, most fabs don’t like large areas of solid metal in one area while
other areas have fairly sparse metal utilization. This rule drives me batty, because the solution
is to cover sparse areas with metal that you really don’t need electrically, but if you don’t,
the fab will when you send them your design. Fabs have programs that will analyze your
design and add metal wherever the program thinks there isn’t enough. Since you probably
don’t want a spare chunk of metal that nicely couples the input to the output of that
wideband amplifier you worked on so hard, you’re pretty much forced to deal with it as you
design; put in the metal and ground it.
Another rule that can cause no end of frustration is the antenna rule. The problem is that the
thin oxides under the gates in transistors are electrically fragile, and the plasma etching
processes used for the etching of all layers can cause electrical charges to develop during the
etching process, threatening the breakdown of TOX. You must calculate the area of a
transistor gate and make sure that the area of any poly or metal connected to it (during any
given etch operation) is smaller than the maximum allowed. Typical ratios are 200:1 to 500:1.
When the gate input is finally connected to a source/drain connection (as will ultimately be the
case), the problem stops due to diode conduction, so gate inputs that will suffer long distance
connections must be brought quickly to the uppermost metal layer where connectivity to
device outputs is finally complete. This is a nasty rule, difficult to check with DRC, and must
always be considered when planning a project.
When you DO settle in on a process; you really have to get to know it well before you can
make good design choices. Expect to spend considerable time (may be days) fully
understanding each design rule and the

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
FABS AND PROCESSES Keith Barr 40

characteristics of each device (resistors and capacitors) before you begin any design. Get to
know the parts first.
Finally, you must recognize that no fab today believes they can make money by doing
business with small-scale customers. I suggest you take as little of their time as possible
asking simple questions, for this will only reinforce their assumptions about you being a waste
of their time. Remember, you’re only one of a growing number of small businesses that are
trying to gain access to the playground; don’t wreck it for the rest of us. Beware of fabs that
produce class 2 products but aspire to, or fool themselves into thinking that class 1 status is
around the corner. These fabs will reject you simply to maintain their self-image, and you
shouldn’t waste your time on them. I believe that one day, maybe not so far off, small-scale
IC production will be common and several easily approachable fabs will emerge to take
advantage of the business opportunity. Until then, help us all encourage the smaller class 2
fabs to embrace our business. Please try to make it easy for them.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
Source : ASIC Design in the Silicon Sandbox Keith Barr 41

Economics
The preceding chapter may have your head spinning, wondering how you’ll announce
to your manager/investor that the cash required for your first project will run well into
seven figures; but that’s not necessarily the case. In fact, the cost of doing your own
SOC/ASIC designs can be quite affordable.
Let’s look at the mask tooling costs for various process technologies:

Process, μm Vdd Metal Gates/sq mm Mask Set Cost, $


0.065 1.0 9 400K 3,000,000
0.09 1.0 9 200K 1,500,000
0.13 1.2 7 100K 750,000
0.18 1.8 5 40K 250,000
0.25 2.5 5 24K 150,000
0.35 3.3 3 12K 40,000
0.5 3.3 3 5K 20,000
0.6 5.0 2 4K 18,000

The number of gates per square millimeter is of course approximate, depending on the
packing density; often this varies considerably depending on whether they are autorouted
or hand packed. More layers of metal tend to allow tighter packing. The mask costs are
also approximate, as the actual number of metal layers used and options like double poly
and high-resistance layers will affect the mask cost.
What can be seen, however, is the significant jump between 0.35 and 0.25 μ, where the
tooling costs abruptly increase as the current technology limits are approached. For
designs that require huge amounts of on-chip processing, or especially on-chip memory,
these fine-line processes are crucial to achieving production economy. I would like to

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ECONOMICS Keith Barr 42

offer though, that through careful system planning, many projects can be reduced to a
point where valuable SOC solutions can be built into the larger minimum geometries,
where the cost of entry is much lower.
Further, an aspect that is not shown above is the cost of the design tools, which begins
to increase sharply once the 0.18-μ barrier is crossed. Above 0.18 μ, reasonably
inexpensive electronic design automation (EDA) tools are available, which allow the
drawing and analysis of circuits in a straightforward manner, costing less than $50,000
for a complete package. As one crosses the 0.18-μ barrier, the effects of conductor
resistance, line-to-line capacitance, and the expectation of yet higher processing speeds
necessitates the use of tools that jump dramatically in both performance and cost. For this
purpose any number of tools can be purchased and used in conjunction to provide yet
more accuracy (at the expense of complexity) in design, analysis, and simulation. The
cheapest design package for deep submicron design approaches a million dollars and
easily exceeds several millions of dollars for a single, complete, state-of-the-art tool set.
Almost universally these advanced tools run on UNIX workstations, whereas the lower
cost tools run on Windows.
It seems possible, maybe even expected, that the high cost of advanced mask sets
would decrease over time, but this does not appear to be the case, at least if the cost of
more mature process mask sets is to be used as an indicator. Over the 10 years or so that
the 0.35-μ process has been run, no significant decrease has been seen in mask cost. It is
becoming understood that the next level of miniaturization may require mask sets costing
as much as $5 to $10 million , indicating that a point of diminishing return is being
quickly approached. In fact, an idea that may have great potential is that of writing the
mask pattern directly onto the wafer instead of using an optical mask process that also
limits patterning resolution. When such expensive masks are involved, wouldn’t it be
nice to prototype a few parts for testing at essentially zero mask cost?
When masks are made at more advanced technology levels, the graphical layer
information must be passed through a software tool that distorts the geometries in such a
way that they will reproduce more faithfully when imaged by the stepper optics. At these
levels, UV light is used in imaging in an attempt to minimize the effects of diffraction
that will blur the fine details. After all, the imaged objects are only a fraction of the
wavelength of the light being used. Optical proximity correction (OPC) software will
carefully analyze the layer geometries and produce a result for mask generation that
allows such fractional wavelength imaging. The process is time intensive and expensive.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ECONOMICS Keith Barr 43

Device Overhead and Yield


The driving force toward finer geometries is the economics of producing very complex and
fast designs. Certainly, as device dimensions reduce, so do load capacitances. The capacitive
loading of a gate input is reduced at smaller geometries, despite the fact that TOX becomes
thinner, making smaller geometry circuits faster, while showing lower dynamic power
consumption. In the case of memories (DRAM, FLASH), the need for device density is
obvious. If a large number of devices is desired, the only alternative to smaller geometries
would be larger chip area, which unfortunately suffers from yield problems. Every process can
be characterized by a given expected defect density, a function of the purity of materials used;
not the least of which is the cleanliness of the clean room air. Typical defect densities are on
the order of 0.5/cm2, which gives a 1cm2 die a bit more than a 60% probability of being good.
For a 2 cm2 die, the maximum size that can be imaged with current lenses, the yield would be
very low, on the order of 15% to 20%; an 8-in. wafer might yield eight good devices. At a
wafer cost of $2000, each die would end up costing several hundred dollars each.
In addition to getting more raw devices on a wafer, making the die smaller makes it cost
less, because you throw away a smaller portion of the possible candidates. A design that
measures 4 mm on a side, if well designed, should yield 90%, or better, providing over 1500
good parts from an 8-in. wafer.
An IC design is not all circuitry, but has a considerable overhead associated with bonding
pads and their associated electrostatic protection devices, space between die for saw lines, and
power distribution.
Figure 3.1 shows a typical arrangement for a 4 mm2 die as it would appear on a wafer. The
outer die are spaced by the width of the “street,” or “scribe lane,” which is typically 100-
μ wide to allow for the dicing saw blade that will divide the wafer into individual die. The seal
ring is a structure that brings the layers to a coordinated conclusion about the periphery of the
device, and, in any case, bonding pads cannot be too close to the edge of the die or the die
edge will collapse during wire bonding. The pad frame is the bonding pads, in this case 100 μ2
metal areas, that are exposed so that connections can be made during packaging. Inside the
pad frame is a power and ground distribution bus that is required to not only distribute power
to the core but also supply the protection devices that are immediately connected to each pad.
The protection devices can be built under the distribution bus or, alternatively, between pads,
and in either case the protection devices need strong power and ground connections to which
excessive pad currents can be clamped.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ECONOMICS Keith Barr 44

Figure 3.1 Illustration of die as positioned on a wafer.

The above 68 pin outline has a usable core area of about 3.3 mm by 3.3 mm, but the used
area on the wafer is 4.1 mm by 4.1 mm, the core occupying maybe 65% of the total space.
Smaller designs will be even less efficient. When designing a part that has many I/O
connections, you must respect the packaging house’s rules regarding pad size and spacing.
Designs that are necessarily large due to numerous bonding pads, leaving more room in the
core than needed, are called pad limited designs. Whereas, in the opposite case, where the core
is large and the number of pads is small, the design is said to be core limited. In designs where
few pads are required, circuitry can be placed between the pads, but the corners should be
avoided for both pads and circuitry. The contraction of plastic packaging materials during
molding places stress on the sharp die corners, and can lead to packaging failures, or even
worse: Field failures can occur when the part is subjected to thermal cycling.
Depending on photo imager limitations, this 4 mm2 die will probably be arrayed as a 4 by 4
matrix onto each mask, and the steps across the wafer would be 16.4 mm rows and columns.
In an effort to maximize stepper throughput, images that would be significantly past the edge
of the wafer are omitted. Further, certain possible image areas, called optical alignment targets
(the OAT fields), are not imaged, but are used in aligning the wafer to the imaging apparatus.
These factors all constitute a loss of potential die.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ECONOMICS Keith Barr 45

Figure 3.2 shows a typical 8-in. wafer with 4 mm2


die patterned in this fashion. Notice the
clamp area that surrounds the wafer, where the layers are not fully processed. If even a corner
of a die enters this zone, it cannot be fabricated reliably. A smaller die can provide greater area
yield than a large die for this reason. From this you can prepare a wafer map that indicates the
exact positions of known good candidates and can assist in wafer probing and the final
packaging operation. The clamp area is from 3- to 4-mm wide.
An 8-in. wafer has about 25,000 usable square millimeters, depending on die size, and a 6-
in. wafer has about 12,000 usable square millimeters.
The nonrecurring engineering (NRE) charge for the production of a set of wafers will
include more than the cost of the mask set. Fabs will require a data preparation charge of
perhaps $1000 to cover the cost of arranging your mask data into the proper array for mask
making, while adding foundry process control monitors (PCM) into the street areas. The PCM
is a long and very thin drawn structure that contains a large number of probe pads connected
to various fab-supplied structures. This allows the fab to probe the wafer after it is finished,
providing process feedback to adjust their wafer production machinery and to generate a
report that you can use to correlate your IC’s characteristics with actual measurements of
that wafer’s process parameters. Such parameters include threshold voltages and the
resistances of all of the layers. The PCM structure is destroyed when the wafer is diced.

Figure 3.2 Step and repeat locations on an 8-in. wafer, die size = 4 mm by 4 mm.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ECONOMICS Keith Barr 46

Further, you may want the fab to do a design rule check (DRC) on the design. Although
your design tools will be able to check such rules as you design, often the rules you use will be
a simplified set that checks most rules but not all. A complete, final DRC check, done by the
foundry on their EDA tools with their “golden rule set,” is advised and can cost from a few
hundred dollars to a few thousand, depending on design size and number of iterations. A
complete DRC on a large design, involving all the possible rules on all of the possible
structures can take a day for a fast computer running expensive software to complete.
Production packaging can be quite inexpensive—on the order of 1 cent per pin—but setting
up to do packaging in high volume requires significant coordination; no packaging house
wants to put 100,000 of your $4 die into 20 cent packages to find later that they were all done
wrong. Further, you will want a marking on the IC, which needs a printing tool. Setup could
cost a few thousand dollars, provided your part fits into a standard leadframe. Often this cost is
borne as a minimum lot charge.
Once your parts are finished and packaged, you must do a final test to reject failures from
both wafer processing and package assembly. A test house is most convenient, as these
businesses do this exclusively. They have many different kinds of IC testing equipment for
analog or digital designs. Your part will require a device-under-test (DUT) fixture and the
generation of files that define the input and expected output signals that indicate a good part.
Normally, you will provide a file from your simulations in the case of a digital design, or
coordinate with an analog test engineer at the test house to specify pass-fail limits. The cost of
this initial work can run from $2000 to $10,000, or more, depending on test complexity. The
production cost of testing depends on the expense of the tester your chip is assigned to and the
time it takes for each test operation. Tests can cost from about 3 to 10 cents a second, which
can be a sizable percentage of overall part cost, if the test routines are not carefully considered.
Neglecting the cost of prototyping, where most tests are done by yourself in a setup that
resembles the part’s application environment, and neglecting the cost of prototype
packaging, which can be very expensive, we can put together some rough estimates of final
part cost based on setup and production costs, die size (which affects yield), process, and
production run volume.

Examples of Economy, the 10K Gate Level


The following graphs shown in Figures 3.3, 3.4, and 3.5 indicate production die cost vs.
production volume, and are based on the assumption that only a single mask set will be
required. Many designs will require at least one design revision, due to a design error or
perhaps

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ECONOMICS Keith Barr 47

based on customer feedback once the part is offered to the customer. Further, these graphs do
not include the cost of design engineering time or the cost of the EDA tools used.
At the 10K gate level, many useful devices can be built. As can be seen from Figure 3.3,
parts may cost under a dollar in as little as 20,000 piece quantities. Here, the yield is good for
all processes, and we see that at this level of complexity, even in huge production volumes,
finer line technologies offer only a slight advantage. For small projects such as this, reduction
in geometry to 0.35 μ can offer better economy, but only in large production volumes. The use
of more advanced technologies cannot be reasonably justified on a cost basis, even at sales
volumes in excess of 100 million units.
Such small gate counts can be handled by field programmable gate arrays (FPGA), but in a
custom chip, analog functions that gather other system needs into the device can be included.
This is where custom IC development really shines, up to the 20K gate range, with memories
(only what you need) and analog functions on chip. The higher supply voltage that 0.6 μ
affords (5 V) makes the analog functions easy and convenient.
In fact, the 0.6 μ, mixed signal, <10K gate area is where the really novel, low-production
rate designs live. Logic circuits without analog stuff on-chip can only be justified when you’
ve got a really good application that can be used in very high volume; FPGAs can likely be
used instead, and you don’t have to go through all of that messy learning about IC design.

Figure 3.3 Cost per die at the 10K gate level.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ECONOMICS Keith Barr 48

The 0.6-μ process runs at 5 V, but can run at 2 V too, provided you’ve designed your part
accordingly. A 3.3-V process will require extra masks to allow 5-V devices, and in 0.35 μ, 5-
V devices aren’t much smaller than their 0.6 μ cousins. Threshold voltages are lower in 0.35
μ though, and if really low voltage operation is needed (like a single 1.5-V battery), 0.35 μ
may be your best choice.
It is the integration of multiple analog functions with small logic circuits that give the SOC
its power, and for these devices 0.6 μ is excellent. A sensor circuit that takes analog signals
from the real world, quantifies them with an ADC, supports a crystal oscillator, includes a
bandgap reference and a regulated power output, a temperature sensor, and a 2-wire logic
interface to an external microcomputer (about 1K gates) may fit in an 8-pin SOIC package and
have a die size of 2 mm2 or less. In quantity, the packaged and tested cost would be about 25
cents.

The 100K Gate Level


At the 100K gate level, which is typical of a fairly complicated DSP design with significant
on-chip memory, the 0.6-μ process is quickly eclipsed by 0.35 μ, simply on account of die size
and poor yield. The 0.35-μ process, however, holds strong as the most cost-effective choice
until volumes on the order of 1 million pieces is approached (see Figure 3.4).
Projects on the order of a few hundred thousand gates can be very complex, including
several hundred thousand bits of memory and fast, wide, signal processing circuitry. A gate is
typically considered a 2-in. NAND

Figure 3.4 Cost per die at the 100K gate level.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ECONOMICS Keith Barr 49

gate, which is a small structure, with only four transistors. Some examples of area usage are as
follows:

■ D flip-flop 4.5 gates


■ Reset flip-flop 6.5 gates
■ Full adder 5.5 gates
■ 2 to 1 MUX 2.5 gates
■ Tristate buffer (internal) 2 gates
■ ROM bit 1/12 gate
■ SRAM bit 1 gate
■ DRAM bit 1/3 gate
■ Simple analog comparator 5 gates
■ Compensated op-amp 10 gates
■ 24 by 24 multiplier 3700 gates

As can be seen, lots of processing can be had at the 100K gate level, that is, if you only use
the gate space you need.

The Million Gate Level


A million gate design is very large and complicated, and as we see in Figure 3.5, die size and
poor yield make 0.6 μ unreasonable at almost any quantity. Small lots may be designed for
low-volume applications in 0.35, 0.25, or 0.18 μ, but the cost may not compete well with
standard parts-on-PCB product solutions. Further, if a second mask set is required due to a
design flaw found only in first production, 0.35 μ becomes the more attractive choice by far. If
very high volumes are anticipated, good economy can be had at 0.13 or 0.09 μ, but little is
gained by reaching to 0.065 μ (65 nm), even in spectacular quantities. A million gates,
however, is an enormous amount of logic.
It is hard to fathom a collection of logic circuitry that would approach a million gates.
Considering the processing that can be afforded at the 10,000 gate level, it is clear that most
circuits in this range are comprised of either large memory blocks, highly parallel
architectures, or are the result of stunningly poor design choices. The latter seems improbable,
but considering that many circuit blocks in complex chips are designed by synthesis, and
routed by autorouter software, inefficiency can be hard to recognize. The designer must
envision a minimized architecture and compare that vision with the resulting jumble of cells
and interconnects

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ECONOMICS Keith Barr 50

Figure 3.5 Cost per die at the 1 million gate level.

that result from automation. This can be so confusing that inefficient layouts can find their
way into designs without the designer realizing it.
By far, the most efficient (in terms of silicon used) layouts are done by hand, while the most
efficient layouts (in terms of design time) are synthesized and autorouted. A logic synthesizer
can take a simple function, say, a decoder that accepts a dozen bits and produces a few outputs
that respond only on specific input combinations, and reduce the logic to an incomprehensible
schematic that contains fewer gates and propagates faster than the same function that a
designer would work out by hand. Synthesizers are pricey and require a somewhat cryptic
input definition (as a text file) and produce a netlist result. This netlist can be translated into a
schematic that is logically correct but difficult to follow. At some point, you just accept the
result, not trying to check over the output by following the schematic.
Demonstrations of the synthesis of blocks small enough to be compared to their human
designed counterparts are impressive. Large circuit blocks, however, can be increasingly
inefficient; synthesizers are often flummoxed by a simple function like a multiplier so they are
programmed to recognize multipliers and insert a preformed logic block instead of attempting
to actually synthesize the circuit. In certain cases, a simple result can be imagined by a
designer, but only a complicated one is imagined by the synthesis engine.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ECONOMICS Keith Barr 51

A similar situation occurs in microcomputer code development. The most dense code is
written in assembly, which occupies a minimum of memory and executes very quickly, but
compilation of code from a higher level language (such as C) can significantly reduce the time
required in programming at the expense of less than optimum execution time and a larger
executable file. Some time ago I wanted a simple program, one I had written variants of many
times, that would occupy maybe 2 KB of code space, and hired a consultant to get the job
done. The result, compiled from C, came back at 32 KB. I was shocked, but the consultant
simply tossed off my complaints saying that “memory is cheap.”
Ah yes, and today is it gates that are cheap? Perhaps, but their cost is not zero!
I’m not saying that through careful consideration and endless hours of hand layout an
inefficient 65-nm design can be done cost-effectively in 0.6 μ, as the absurdity of such would
be measured in astronomical terms. I would like to offer, however, that circuit functions can
often be chosen with layout structures in mind, and even specialized cells can be designed
such that, when arrayed, they make layouts extremely dense, allowing the process choice to
shift toward the cruder geometries, lowering the cost of entry (mask cost) by nearly an order
of magnitude. Further, such structures can significantly improve data throughput while
lowering clock rates and consumed power simultaneously, while also making the circuit easier
to understand, modify, and debug. An autorouted block, the netlist of which came from a
synthesizer, is incomprehensibly arcane.
When an engineer is faced with the development of a circuit block without an understanding
of those other blocks to which his must interface, a certain synergy is lost. A system designer
often plans an entire project without delving into the finer details of each subsystem, assuming
that circuit functions can be adequately compartmentalized, largely out of convenience.
Further, when hardware is described by a hardware description language (HDL) for input to a
synthesizer, the construction of the function description will determine the resulting netlist.
There is a possibility of inefficiency in the synthesis arena, which is difficult to detect; and this
may force the use of a newer, more expensive process that is acceptable, simply because
“gates are cheap.”
If your circuit is intensely competing with other designs, time to market is vital. You may,
in this case, synthesize everything, with the hope that you beat your competitor in terms of
time. If your market runs at a slower pace, where you may observe that your intentions may
have been realized a decade earlier (such instances abound), you have the luxury of time to
keep your project “under wraps” and optimize your design with more careful choices. In
this case, you will be able to use a cheaper process, more handcrafted circuits, and come up
with a more cost-effective design.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ECONOMICS Keith Barr 52

When planning an IC that is intended to replace a product’s existing circuitry, you may be
tempted to simply “copy” the functions from a PCB directly onto the ASIC. My advice is to
not even consider such a thing. Look at the overall product’s function, imagine what
additional features may be easy to provide and valuable to include, and which features can be
removed; learn about all the toys we have available in the sandbox and design a part from
scratch that really gets the job done cost-effectively. You will find that such an approach gives
you a better product with lower overall costs.

Test and Wafer Probe


Back in the days when yields were low and packaging was more expensive, it was (and still is)
common to probe the wafers prior to dicing and bonding—a preliminary test that would sort
out bad die. A probe card would be made,which would connect to a tester, and a programmed
machine would step the card onto a die site, make connection to the pads, test the die’s
characteristics, mark the die center with a black dot (in the case of a failure), then lift the card
slightly and step to the next location. The black marking is recognized later by the packaging
house as a failure, and will be rejected. When the die are small, with a high expected yield,
and the cost of packaging is relatively low, it can make more economic sense to package all
the reasonable die candidates, then test the parts once, as finished parts. Parts must be tested
after packaging, in any case, to sort out packaging failures, and often it is most economical to
do one test instead of two.
The fab will have probed their PCM sites and gathered data on their process, and if the
wafer is out of bounds for them, they will not sell it to you. A good wafer that suddenly
delivers bad parts indicates that the design is close to some parametric edge. When you design
you must carefully simulate your circuits at the extremes of the fab-provided process limits so
that such surprises simply don’t occur.
For very small runs of largely analog parts, testing can be done at the packaged level with a
simple, in-house developed tester that places the part in an environment similar to that of the
finished product. Low-cost testing can be done this way right on the production line,
especially when device characteristics like bandgap reference voltages or signal-to-noise ratio
are of concern to the design team. When the product looks stable, more automated testing
procedures can be developed.
I strongly suggest against in-house testing as a routine alternative to having the
professionals do it because the cost of maintaining the test fixtures and the cost of high speed
package handling machines prohibit it unless the volumes are very large. Using a test house
allows their

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ECONOMICS Keith Barr 53

equipment to be time-shared to your benefit. This is their specialty, and they likely can do it
better than you.

Small Production Runs


Foundries are in the business of making wafers to customer mask sets; they do not eagerly
look forward to coordinating the mask making process and storing the masks for the customer
in a safe place only to actually make a few wafers. They want their customers to be successful
enough to run hundreds of wafers a month, in which case they will offer the best wafer price.
If your masks remain unused for too long, they will charge a storage fee or return them to you.
Fabs will happily work with you to make prototypes through their MPW runs, and produce a
few wafers through the MLM process (if available), but only with the hope that the project
turns into good ongoing business.
Your designs could end up in the merchant IC market, for which you will need distribution
and sales, which can be difficult for a small organization. If you do attempt to market your
chips, be sure to charge enough for them. You will need room for representatives (maybe 10%
of sales) and distributors (maybe 20%), and to pay back the initial development costs. It’s
been suggested that you should get three to four times your (volume) production cost initially,
and expect that your pricing will degrade with time to perhaps twice the production cost when
the product is mature. Very large customers, on a factory direct basis, may be cut a break to
get the deal closed.
More likely, your design will become a part of one of your company’s proprietary
products. In this case, tremendous benefit can accrue from the IC, but relatively few parts will
be needed to realize this benefit. My experience is in this area, where the product simply
wouldn’t exist without the IC design. Whether it’s cost, power consumption, size, or some
advanced functionality, a fairly expensive end product can result from the inclusion of a single
specialized custom circuit. Unfortunately, the finished product may do very well in sales, but
if the chip is only a few square millimeters in size, the fab may end up producing only a test
run, then a year later you may come back wanting 10 more wafers (125,000 parts). Fabs don’
t like this and you won’t be on their A list as a result.
The minimum production run for a fab is a boat, which is a lot size of 25 wafers. The term
“boat” comes from the carriers that are used to shuttle wafers around in a fab, and the
wafers you receive will probably be in a sealed wafer carrier that also has a 25 wafer capacity.
I suggest buying more wafers less often as opposed to the other way around, keeping your
purchases to 25 wafer multiples. The fab will encourage you to do this.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ECONOMICS Keith Barr 54

Wafer pricing will vary with purchased volume—something not calculated into the earlier
pricing graphs—with the per wafer pricing ranging approximately 2:1, from a single boat
quantity to the point where price begins to level off at about 500 wafers a month.
Products can be made possible on account of a single IC, without which the product would
be impossible. In these cases, the IC itself may constitute a small (but important) part of the
total product’s cost, so it doesn’t matter what the IC costs to produce. Don’t get carried
away trying to reduce the IC cost from 60 cents to 40 cents, as this may look good while
focusing on the part, but when looking at the product, it’s like reducing the cost from 1.5%
of the bill of materials (BOM) to 1.0% by beating on the fab; they won’t appreciate it, and
you may need their help in the future with the next project. If you’re making money, be
generous with the fab that helped put you in business.
Don’t get carried away dreaming of just how small and cheap the part can possibly be, as
this may require more design work, higher mask costs, and smaller wafer purchases that, for a
modest market, simply doesn’t make sense. The range of possibilities is mind-boggling, but
once you think you have a grip on it, try to be reasonable with your choices.

A Final Word on MLM


Many foundries today are attempting to secure large business “partners” to the exclusion of
all others, trying to follow the model of the very large and fast growing class 1 foundries. The
success of this approach will depend on the endurance of Moore’s law, which states that IC
complexity will double every 18 months. Moore’s law has been challenged repeatedly in the
past, and all challenges have failed. Beginning in 1995 and continuing through 2005, the IC
industry saw an unprecedented explosion of process developments, largely fuelled by the
popularity of the personal computer, which drove device dimensions from 0.35 μ down to 65
nm. The issue today is not that smaller devices can be made, but whether the mask sets and
processes for further size reduction are affordable. Perhaps there should be a Moore’s
corollary: ICs can get smaller, but can you afford it? It is quite possible that as the cost-limited
processes mature, the effects of competition will degrade the profit performance of the class 1
fabs to the point where they no longer represent a model for smaller fabs to follow. Perhaps
the older, stable, and well-understood processes will become more available to IC designers
on a small-scale basis, and some fabs will embrace this approach almost the way prototype
PCB houses do today.
The lenses used in photo imaging are frightfully expensive devices because they are
designed to image with near perfect focus over a wide field area, requiring dozens of precision
elements in the optical path. A lens that can image a wider area on-chip can step fewer times
per wafer

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ECONOMICS Keith Barr 55

and have a higher throughput, but as the field widens, the lens complexity (and cost)
drastically increases. Steppers should be able to produce the image on a smaller area, trading
off throughput for amortized imager cost. If you only need to image a small area, on the order
of 0.5 mm2, a $50 microscope objective does a great job. I used to make microscopic printed
images this way when I was a kid. Typical wide area imaging lenses are priced at millions of
dollars each. Further, as devices exceed approximately 1 cm2, the expected yield begins to
drop sharply.
Referring to a previous description of the multilevel-mask (MLM) process, where four
mask layers are placed onto a single mask (with the cost of just a single mask), with the
imaged area reduced to, say, 10 mm by 10 mm and the step and repeat process performed four
times as often, why not go further in this direction?
If a mask can be divided into four areas, each imaging a 10 mm2 area, why can’t it be
divided into nine areas of 6.6 mm2 each, or 16 areas of 5mm2 each? If a 5 by 5 matrix of mask
images was to be placed upon a single mask blank, a 25 layer process could be fabbed into a
maximum die size of 4 mm on a side; and an entire mask set would fit onto one mask blank.
With the advantage of computer scheduling of operations within the fab, it is easy to imagine a
fab with one additional small-stepper that uses this technique to fabricate small run projects.
The throughput would be terrible, taking the current rate of perhaps 50 wafers per hour for full
area images down to maybe two wafers per hour at the small-step level, but the machine
would be cheaper, and the rest of the fab would be unaffected (diffusion, resist coating and
development, implantation, layer deposition, etching, oxide growth). The wafers would end up
costing more than full-imaged ones, which is expected, but the mask would be really cheap.
The CMOS process has become so standardized that one can imagine PCM structures made
just for this 4 mm2 project size, to be included at the edge of each image, and enough data
from the PCM to provide pass/fail and process control feedback, while allowing small
customers to interface inexpensively with the fab on a routine basis. The customer simply
accepts the project fabbed on the well-standardized process as a commodity item; no further
discussion or coordination required. Kinda’ like prototype PCBs are made today. This
won’t happen, however, while fabs are all chasing the class 1 model. The irony of it all is
that few class 2 fabs will ever win in their battle.

Wafer Pricing
Clearly, the cost of the finished wafer depends on wafer size, the number of masking layers
required, and the quantity of wafers that you purchase. Also, the price may vary by a factor of
2, depending on how busy

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ECONOMICS Keith Barr 56

the fab is. In production volumes (a few hundred a month), 8 in., three metal, 0.35-μ CMOS
wafers may be priced between $1000 and $2000; their 6 in. cousins cost perhaps half as much.
Small volumes, however, may cost significantly more; if you only need a few wafers, a fab
may charge several times the production cost just to cover their cost of handling the business.
If your design is clever and you win market share as a result, then you can justify paying more.
You may then consider sending a thank you to the fab; send them a fruit basket to show your
appreciation. I can’t stress this more: They are allowing you to be in a whole new business.
If you’re making money, so should they.

Design Tools and Time


The die costs indicated in the previous graphs do not reflect the engineering time put into the
project, or the cost of computer tools used, or for that matter, prototypes, test fixtures used in
prototype analysis, or the engineer’s time spent simply learning how to use the tools. Your
first chip may not become reality for 12 to 18 months after the project is first considered.
Subsequent designs, depending on complexity, can be knocked out in as little as a month or
so, whereupon one waits for prototypes to return from the fab, works on a test fixture for the
incoming prototypes, or starts a new design.
One of the objectives of this book is to provide guidance during the starting period, as poor
choices made early on can severely impact first chip development time; you don’t look
forward to finding yourself backed into a corner, ripping up all of your work, and beginning
again from a new perspective. Setbacks can be avoided by a better understanding of the design
process, the assortment of design possibilities, and the specifics of each design alternative. I’
ll try to detail some useful circuits later, which will hopefully inspire ideas, and include
information on certain “traps” that you will want to avoid. Getting snagged in a trap really
hurts. As they say: “The first guy through the pass gets all the arrows.” It may be
comforting to know that you’re really not the “first guy through the pass,” but if you
don’t learn from those that have been trapped, you’re out there all on your own.
If I’ve got you hooked on the idea of designing your own ICs, then, in terms of
economics, the best present investment of your time may be reading the rest of this book.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
Source : ASIC Design in the Silicon Sandbox Keith Barr 57

Design Tools
The preceding chapters strongly indicate that the opportunities for a successful ASIC
project involve at least some analog circuitry, and that purely digital circuits may be best
left to off-the-shelf programmable logic parts or the programming of an available
processor; such parts abound today. Further, for many valuable SOC designs, even RF
projects that operate in the GHz range—processes like 0.6 μ and 0.35 μ—are attractive. If
your objective is very dense, logic-only circuits, you may take a different design path, one
that may be outlined better in other books. Here, the tools and techniques I will
recommend are “sandbox oriented”—a synergistic combination of logic and analog
functions that provide novelty, utility, and economy.
1. Capture schematics with a schematic editor, producing a schematic netlist; or define
logic structures in RTL and synthesize a netlist.
2. Draw IC mask layers with a layout editor, or autoroute from a schematic netlist.
3. Check that the layout is within foundry rules with a design rule checker (DRC).
4. Extract the layout with an extraction tool (EXT), producing a layout netlist.
5. Compare the netlists with a layout versus. schematic tool (LVS) to guarantee
sameness.
6. Determine analog circuit performance with an analog simulator (SPICE).
7. Determine logic function and timing with a logic simulator.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 58

Any design will require very many iterations of the above sequence, developing the
design from the lowest levels up to the top level. The output of the finished circuit to the
fab will be the stream output of the top level from the layout tool as a GDSII file
(pronounced GDS2). GDSII files can be quite large, and are often sent via FTP to the
foundry, although small circuits can be zipped and e-mailed.
Integrated circuits of today’s complexity cannot be designed as a single level
schematic and a single level layout, in a flat condition, as the details are far too great to
comprehend. A flat schematic, even for a small design, would be at least 10-feet wide and
still be barely readable without a microscope. The concept of hierarchy is used to greatly
simplify your work and make the entire project manageable. A hierarchical schematic is
one where symbols, drawn by the designer, represent underlying schematics. At the
lowest level of design, a transistor may be drawn as a symbol to be used in higher level
schematics, and attributes attached to the symbol tell the schematic tool that this is
perhaps a primitive device, and that no underlying schematic is to be expected. Transistor
symbols can then be used numerously in a schematic for, say, a flip-flop, and terminals
are named in the flip-flop schematic to correspond exactly to the pins of the flip-flop
symbol that you draw. The schematics and corresponding symbols are given identical
names, which associates them. Once done, the flip-flop symbol can be used in yet higher
level schematics, each with its own symbol, in like fashion. You may have a circuit that
contains thousands of flip-flops, but you only draw the schematic and symbol of the flip-
flop once.
At the layout level, cells are drawn with named connection ports (or pins), where the
lowest levels in a design are often standard cells (NAND, NOR etc), or, in some cases,
simple transistor layouts (with no need for declared port connections) that are expected to
be used many times in cells that will have named port pins. These cells are given names,
ideally the same as a corresponding schematic, and then instantiated into higher level
cells. In this case also, the thousands of flip-flop design requires that the flip-flop be
drawn only once. A typical design may have only four to six levels of hierarchy, but
through the use of hierarchy, an explosion of complexity can result even from relatively
few drawings; for example, if the average schematic contains 20 interconnected symbols,
the sixth level of hierarchy will represent tens of millions of primitive devices.
The arrangement of schematics and layouts can be any way the designer likes, but
knowledge of the tools, especially LVS and the various simulation tools, will encourage
one to gravitate toward building the circuits in blocks that have schematic and layout
equivalence and the same cell names and pin names. Names don’t always have to be the
same between layouts and schematics, but making them the same will avoid confusion
later.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 59

The Schematic Tool


Although most of my design tools are from Tanner—layout, DRC, LVS, EXT, standard cell
place and route (SPR), and schematic-driven-layout (SDL)—I use the ViewDraw tool offered
by Mentor Graphics for schematic capture. All my tools run under Windows, which is
particularly convenient, and on a laptop. S-Edit, the schematic capture tool that comes with the
standard Tanner package, is great for analog designs and logic areas that aren’t too complex,
but I am interested in wide signal processing and the liberal use of busses (a bundle of
associated wires). As of this writing, I understand that Tanner’s just-released schematic tool
is advanced to this level and more. The ViewDraw tool does support busses, but each must be
uniquely named (the tool won’t do this for you); a drawback I’ve learned to accept. All
tools will have quirks that make them different and cumbersome, and accepting the drawbacks
of the tools you’ve chosen is just part of the process.
Most schematic tools operate similarly, allowing a schematic to be built by placing symbols
onto the page and interconnecting symbols with wires (single connections) or busses (a bundle
of associated wires). The naming of a wire in one place on a schematic sheet can create
connectivity to a same-named wire at a different location, so the resulting schematic does not
need a jumble of wires cluttering up the schematic to make all of the connections. Further, any
wire of a bus, [for example: ASIG (0:150)], can be accessed by using the bus name and wire
number anywhere on a page (like ASIG12). At any time, one can descend into a symbol,
popping up a new window that shows either the schematic or the symbol to be edited and then
optionally stored. You can descend in this way through the entire hierarchy, from the top level
to the most primitive device, in just a few clicks.
The symbols that you draw can have attributes attached, keywords, and values that are
related to the schematic editor and other tools. Symbols that are already instantiated onto a
schematic can be individually characterized with special attributes particular to that symbol’s
instance, such as a transistor width, length, or model name. When drawing symbols, it is often
convenient to specify in the attribute list the pin order that will be used when the schematic
tool produces a netlist, as this pin order may need to coordinate with a simulation tool in some
expected way. Further, each symbol connection pin can have a pintype attribute attached, such
as IN, OUT, TRI, and so forth, which aids in checking schematics for interconnection errors.
When a netlist is generated, certain symbols may have underlying schematics that cannot be
understood by a logic simulator (standard cells or RAM, ROM) and attributes can be attached
to these top level symbols that instruct the netlister to stop at the symbol pins, expecting a
written behavioral

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 60

model (a text file that you must write) to describe the block’s logical function during
simulation. Behavioral models are very simple, but they must be written even for every
lowest-level cell you wish to logically simulate.
Schematics can be organized in any imaginable way, but you should keep each sheet
reasonably well ordered, so that later, after you’ve forgotten why you did something a
particular way, you can open a schematic and easily see how the block works. Use text
wherever you like, as a guide to design choices, timing diagrams, notes here and there,
anything that helps you think through what the block does and gives clues to your initial
thinking, even years later. Try to establish a top-to-bottom and left-to-right signal flow, and
keep the rules you develop consistent throughout the schematic set.
As much as possible, tend toward drawing and naming schematic blocks the same as you do
in corresponding layout cells. For example, if you draw a multiplier layout, attach ports to the
inputs and outputs on the layout and have a single schematic page (with corresponding
symbol) that matches the layout. Adhering to this path will allow quick verification that the
layout is in fact equal to the schematic when using other tools.
The schematic tool should have various checking options that allow the circuit to be
analyzed in terms of connectivity. Your software will enable you to turn various checks on or
off as you desire, to flag errors and warnings concerning unconnected inputs (very bad),
unconnected outputs (often just the unused QN output of a flip-flop), outputs shorted together
(also bad), or any number of conflicts that may or may not be acceptable. Learning about the
kind of checks the schematic tool can generate, and modifying your symbol generation early
on will save a lot of trouble later in the design process. When you’re in a hurry, it’s easy to
forget to attach attributes to a symbol and its pins, but remember, it only needs to be done
once, provided it is done right.
The schematic tool should be able to export files that represent the drawn schematic and all
of its underlying hierarchy, in at least three forms: EDIF netlist, which can be used as a
description of the circuit for input to an auto-place-and-route tool, automatically creating a
layout of the schematic in SPICE netlist form, so that SPICE simulation may be performed or
schematic-layout equivalence can be determined by LVS; and Verilog or VHDL format,
which forms the circuit description for logic simulation. All of these netlist files—EDIF,
SPICE, and Verilog—are text files that can be opened in a text editor, understood, and
modified, if desired. They are quite “readable.”

The Layout Tool


I use L-Edit, a layout tool from Tanner. I also have experience (years ago) with Virtuoso, a
much more expensive Cadence tool, as well as other tools that I have tried but not found to be
particularly interesting or more

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 61

useful. Free tools exist, but are not worth the trouble of learning. Probably, the most difficult-
to-learn tool you’ll ever need in IC development is your layout tool; choose wisely, as the
time spent learning the tool will end up wasted if you must change to another. I cannot stress
this more severely: ALL of the other tools you need can be understood in a matter of days, but
you can take months to get comfortable with the layout tool, and years to fully understand it if
it is a good one. Layout tools are extremely complex; a good layout editor will have many
ways to do a given task, and valuable options may remain undiscovered until you’ve
completed your third or fourth design. I’m still finding new capabilities in L-Edit, which is
very intuitive, and I’ve been “driving” the tool for almost four years now.
The layout editor allows you to draw shapes on any layer as boxes, polygons (any object
that’s more complicated than a rectangle), or wires (essentially constant width paths); and
move, copy, stretch, delete, merge, attach pin notations, and other manipulations, as well as
select a collection of objects and make them into fixed entities called cells. Cells can be then
instantiated along with other objects to create larger, more complicated cells. All cells are
given unique names, and should, at the upper hierarchical levels, be named the same as the
corresponding schematic, for convenience.
The layout editor is a graphical interface, allowing each layer’s drawn objects to be
colored, outlined, and filled as you like. The choice of layer colors and the order in which
layers are rendered will allow you to see multiple layers on top of each other, while retaining
the ability to see through them. Choosing layer patterns and colors properly can greatly
improve your ability to quickly understand a complex layout. Some tools are rather
cumbersome in their ability to quickly modify such layer rendering properties, but I find L-
Edit to be extremely quick and intuitive in this case. A layer palette is provided, from which a
layer can be chosen for drawing. Double clicking on a layer in the palette brings up a dialog
box where every property of that layer can be quickly viewed and modified. Layers can have
many properties.
The layer has a name, which can be any name you wish, but every layer that must exist in
the GDSII file for making a mask must have a GDS number, and this number differs from
foundry to foundry. Foundry A may call the metal1 layer GDS#21, whereas foundry B calls
metal1 GDS#8. The layer name is unimportant to the GDS file, as only the GDS number is
conveyed as valid information. You MUST check these numbers prior to generating the output
GDSII file.
Each layer has a defined amount of capacitance that drawn objects will have to the
substrate, expressed in terms of both the area of the object, which leads to a capacitance
directly, as well as a perimeter,

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 62

which causes a fringe capacitance to exist. These parameters can be associated with each
layer; values coming from the foundry’s layer capacitance information. Also, each layer has
a sheet resistance associated with it, and this can be entered for objects that will become
resistors. Using this information, the extract tool (EXT) will be able to determine the
resistance values of drawn structures by length and width measurements, referring to the
layer’s sheet resistance value in the calculation.
The rendering (display) of layers on the working screen allows many choices of color, fill
style (called a stipple pattern), and outlining. When a layer is selected, it is convenient for it to
appear somewhat different; layers can be set up to appear different in any of the rendering
parameters when an object is selected. Similarly, ports on any layer and the text that is drawn
to name a port can be configured to be of any color or style, in both normal and selected mode.
Although your design may only require 12 layers for GDSII output, the number of layers
required for the tools to work properly can be many times more. Usually, GDSII output layers
are drawn, while others are derived from combinations of drawn layers or even combinations
of other derived layers. Some layers are drawn but are not output in the GDSII file; such layers
are used in defining exactly where a resistor starts and ends, or, perhaps, to define areas (like
the chip’s periphery) where different design rules exist. The derived layers are used by the
design rule checker (DRC) to analyze the circuit for design rule violations and extraction
(EXT) software to properly recognize elements like transistors, capacitors, and resistors.
This layer information is part of a technology file that defines how the layout looks, how
rules are checked, how netlists are to be constructed from the layout, and other variables that
are considered to be relevant to a given process technology. The technology file, containing
the information you’ve collected to represent a given process, can be copied to another
project file, so that the work done during your first design can be the basis for subsequent
designs. This is one of the many reasons for selecting a process carefully and using it as much
as possible in later designs; a technology setup that is structured the way you want is a
significant investment in time.

General layout setup


Mask layers are drawn to a minimum manufacturing grid dimension, and the mouse snap grid
can be set to that dimension, or some multiple of it. Only careful review of the fab design rules
will allow you to determine the best snap; rough snap values allow easier placement of cells
and objects without the need to zoom in to verify placement, but a

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 63

fairly fine snap may be required to draw some geometries. I use a 0.5-μ snap to draw 1.5-μ
circuits, but 0.05 μ to draw 0.35-μ ICs.
Years ago, the idea of lambda design rules was established—a lambda being a rather coarse
increment that all design rules could be multiples of—the benefit being a larger snap value and
also that as newer processes were developed, their rules could use the same ratios, and only
the lambda dimension could be made smaller, through a process of “shrinking.” This really
didn’t work out; newer processes can gain some space saving advantage by making some
rules slightly other than a lambda scheme would permit. Today, almost everything is drawn in
a fractional micron snap fashion to foundry rules, but you may encounter some processes
offered by MOSIS that accept GDSII data files drawn by lambda rules. You do, however, need
to inform MOSIS as to which is being used during project submission.
The rendering engine in the layout tool will attempt to detail every tiny bit of the design as
it is drawn onto the screen during zooms and pans, which can take significant time with
complex designs. The drawing engine can be set to ignore details smaller than a minimum
number of pixels, drastically increasing redraw speed. If your tool seems slow, check for this
feature; on a modern computer you should never have to wait for a redraw. Some tools are just
simply slow; I find L-Edit to be very fast.
Note on zooming: L-Edit uses the middle button mouse wheel to zoom while control is
pressed, and pans similarly while Shift or Alt is pressed. However, hotkeys can be
programmed any way you like to invoke any of the tool functions. Traditionally, IC design
software ran under UNIX, where Shift-Z would zoom out, Control-Z would zoom in, and Z
alone would begin a box-zoom operation, as in the Virtuoso tool. As PCs have become much
more powerful, layout tools that run under Windows (L-Edit in particular) subscribe to
Windows conventions where Control-Z means undo. Because the mouse buttons are pressed
and held during copies and moves, the mouse wheel is no longer available for panning and
zooming. I strongly suggest that if you use the L-Edit tool, you map Shift-Z to zoom out,
Control-Z to zoom in, and Z alone to begin a box-zoom operation. Further, map the F key to
the zoom-full function. L-Edit is particularly convenient in this case, where an object can be
selected while zoomed in, picked for copy or move, and then one may zoom out and then back
into a completely different part of the design to drop the object into place. When you consider
tools, make sure this kind of flexibility is available, as it can really save time in drawing.
Properly set up, your right hand can drive the mouse while your left hand rests at the lower
left of the keyboard. In this fashion, objects can be drawn and edited very quickly. If you’re
left handed, however, you may want a different scheme.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 64

The layout tool will have varying degrees of automation that can help with drawing objects.
Virtuoso allows parameterized cells to be drawn through a tool called PCells, Tanner does it
with TCells, and Silicon Canvas does it with Magic Cell; each tool has some degree of
automating simple structures, like transistors, guard rings, and so forth. Each tool also has
varying degrees of difficulty in setting up an automated structure, and I find that none of these
drawing automation enhancements are particularly useful. All tools allow a single cell to be
arrayed into multiples in a rectangular shape, which I find adequate for building multiple
element transistors or guard rings. Often, the automated result is not exactly what you want,
and you end up doing it by hand to get it right.
Some drawing tools will also do design rule checks as you draw, but this slows the tool
down, as it must be working hard in the background as you edit. I keep this function off when
I draw. The feature sounds good, but it is not all that helpful.
Some tools allow clicking on an object, whereupon all electrically connected objects will
become highlighted. This is done differently, depending on the EDA tool. In some cases,
individual lines can be selected and they will highlight after a delay (as in Silicon Canvas), but
with other tools, the connections of the entire cell are evaluated, which takes some time, after
which, objects can be quickly highlighted (as in Virtuoso and Tanner).
Another popular layout tool feature is that of placing cells from a schematic into a layout,
where they can be manually routed by following connection fly wires. I usually autoroute such
blocks, since the Tanner tool set comes with an autorouter. If the blocks are simple, I build
them by hand. If the blocks are too complicated, you won’t want to do it by hand, even with
fly wire guidance, and if the blocks are simple, hand layout is easy (actually a lot of fun).
Tool manufacturers will advertise features such as these, which seem clever at first sight but
may not end up as having much real value in practice. What I have found to be much more
important than the “bells and whistles” that tool manufacturers tout, is a layout tool that is
easy to work with, well documented, and has a good, intuitive user interface.
My favorite tool so far is L-Edit, because it’s quite inexpensive, very “polished” in its
presentation, and has many conveniences that once learned, you’ll use frequently. Further, L-
Edit can come with added features (at increased cost) like DRC; EXT; and the autorouter,
SPR. The SPR package does not produce the densest results, and is limited to only three metal
layers, but the convenience of drawing a schematic and getting a complete layout in minutes is
simply spectacular. The entire package is priced around $20,000. Many autorouter packages
start at many times the cost of the complete Tanner package.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 65

L-Edit has a set of internal functions that can be used through the user programmable
interface (UPI) that allows custom automation when desired. The system allows C code to be
written using these functions, and is ideal for programming ROMs from externally derived
data tables.
This is not an advertisement for the Tanner tool set, as there are advantages found with
other tools, especially at the 0.13 μ and finer levels. In fact, if you’re expecting to do very
deep sub-micron work, you should consider much more expensive tools; and they are MUCH
more expensive. Perhaps a rule of thumb would be that your total EDA tool cost should be in
the order of the cost of a few mask sets.

Design Rule Checking (DRC)


The design rule checking software is a simple tool that analyzes the design by a set of design
rules that you enter manually or get as a package from the foundry. Running DRC is simple:
the software will identify and zero-in on errors, and you can fix them one at a time, but setting
it up can be a real headache as it is very much involved with derived layers.
If you’re laying out a logic-only IC, then a standard cell library and a DRC setup from the
foundry might be all that you need, but if you want to include any analog circuits that really
give your design value, then you might as well go through the DRC rule definitions, arrange
the rules the way you like, and be in full control of your project. If you draw objects, they
must subscribe to the fab rules, so carefully reading through the design rule specification and
writing DRC checks manually will get you up to speed very quickly.
When you write the rules the first time, you may find the process tedious and complicated,
but once you’ve done it, like everything else, the next time will be much easier. Many
foundry processes have very cryptic names for both the DRC rules as well as the layer names.
Also, a process that contains extra mask layers that you don’t use can have those layers
stripped away (by you) to make the final checks run faster. Unfortunately, modifying a
foundry layer list and DRC definitions is like trying to understand someone else’s
uncommented C code, and you can get into real trouble trying to “simplify” a fab’s rule
set. I find that it is best to accept the fact that you’ll need to know the rules to use them, so a
careful read of the rule set and the careful construction of exactly what you need in terms of
layers, both drawn and derived, with names you understand, may be in order. You DON’T
need to do this if you’re just placing foundry supplied standard cells (logic only) and wiring
them up, but if you want to do analog functions (what I would call IC design), you’ll have to
roll up your sleeves and get to work.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 66

I suggest organizing your layers in an order that makes drawing easiest, with drawn layers
at the top of the list, perhaps in order of IC processing steps, drawn layers for component
identification or DRC rule exclusion (peripheral rules) next, followed by derived layers for
DRC rules, and finally derived layers for extracting devices. Often, the layers you derive for
DRC can also be used during device extraction.
All mask layers have simple minimum width/space DRC rules, but rules can apply to a
process that is complicated in derivation. For example, an implant mask must always surround
an active area by a minimum amount, and transistor gates must extend past the transistor
junction by a minimum value. The identification of such geometries is defined in the DRC
setup, and each tool will have varying limitations on what geometrical relationships can be
properly derived for checking.
To do the more complicated checks, one must derive layers from the drawn ones. There are
basic formats for layer derivation, the simplest being the Boolean derivation: the combination
of up to three input layers, in AND or OR fashion, true or false, with the ability to bloat
(increase size in all directions) or shrink (the opposite) any input layer in calculating the
resulting derived layer.
Derived layers may also be the result from two input layers that meet certain select criteria,
as in how the input layers touch, enclose, or overlap each other. They can also be determined
by the area or density parameters of a given input layer.
DRC checks are then performed on the drawn and derived layers through a list of DRC
rules, each with a specific definition. Each DRC rule has a name that will be used as a
reference when violated rules are pointed out by the layout tool, and your names can be as
obvious or as cryptic as you like. Foundry rules seem to always have cryptic names; name
your rules so that they make sense, but make sure you cover the foundry rules as completely as
possible.
DRC rule definitions concern widths, spacing, the surround of one layer around another, the
extension of one layer out of another, and so forth. DRC rules always compare one layer with
itself (width, space), or a relationship between two different layers. Some rules may be
impossible for your tool to properly interpret, in which case you should be particularly careful
in layout and rely on the foundry DRC to catch errors you cannot check, or you could modify
your rules to be “looser” than the fab requires, while, in the process, gaining some
advantage in rule checking.
The process of developing a DRC rule set and of creating derived layers is entangled. It will
take some time to make your layer set as efficient as possible, and it’s easy to get lost in
creating yet another layer to use in yet another DRC rule check. Plan the work carefully, and
take the time to find a minimum of layers that can be used in as many rules

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 67

as possible. Your DRC checks will run faster, and your layer list will become more
manageable.
An example of a difficult rule would be that of wide metal rules, where the space between
two objects on a metal layer would normally be, say, 0.5 μ, but the spacing between any
metal1 object and a wide metal1 object (wider than 10 μ) would be, say, 1.0 μ. In this case,
you need to create a derived metal1 layer, let’s call it M1shrink, that is equal to metal1 after
having been shrunk on all sides by 5 μ, then another derived layer called, say, M1bloat, which
is M1shrink bloated by 5 μ on all sides. The shrinking and subsequent bloating causes small
objects to disappear. Now we have, on M1bloat, only metal1 objects that are greater than 10
μ2. We use that layer to compare with all Metal1 objects to do the 1.0 μ check. In the process,
we also need to specify in the DRC rule that all coincidences of the two layers should be
ignored. Whew! Some DRC rules will drive you nuts, causing an explosion of derived layers;
but most are very simple.
Often, a drawn layer is required that identifies the outside edge of the die, and in some
cases, bonding pads too. This allows the DRC to check the periphery by different rules, and
can mean that you must define derived layers for your metal, contact, and via layers, both
inside the chip and at the periphery, as different rules may apply. Take your time and plan
well.
You may wish to greatly simplify your work by creating drawn layers that are not exported
to the GDSII file, but simplify the drawing process; you would then have the tools generate
the GDSII layers prior to tape-out. Usually, you would draw a transistor by defining an active
area, drawing the gate poly across it, then adding an implant mask (N or P), and possibly other
masks that are always associated with transistors, which surround the active area by some
minimum amount. Alternatively, you could draw from layers named NDIFF and PDIFF, and
have the tools generate the active area and implant masks for you. This technique draws
quicker and looks less cluttered, but the design’s database, after the layers have been
generated, is much larger than before, and may cause problems at the very end of your design
process (which is the worst time for trouble to pop up).
My suggestion is that if your designs are small, this is a great technique, but for large
designs (over one million transistors), you may be better off drawing all of the layers by hand,
as the file after layer generation may be too large for the other tools to handle. Alternatively, if
the entire design is constructed in repeated blocks, such as memory and I/O pads, the layer
generation can be done to each block, lowering the database size.
Two other, now standard, formats exist for DRC checking; the Dracula format developed by
Cadence and the Calibre format from Mentor

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 68

Graphics, which have more advanced geometrical relationship capability. These rule sets only
use the GDSII output file numbers for DRC checking, ignoring any extra drawn or derived
layers you may be working with, and ignoring any special names you have given to your
layers; only GDSII layer numbers are used. This is great, as you’re only checking what
you’ll be delivering to the foundry, and you can organize the layers that you draw, and name
them any way you like. It’s particularly convenient that the Tanner tool set with HyperVerify
allows either of these DRC rule sets to be used as final checks, although the checking may
take longer than the custom-built DRC rules you would routinely use in the course of
developing a design.
Also, since foundries only offer a few setups that you can use, typically Cadence versions,
Tanner has included a means by which Virtuoso setups can be imported directly into L-Edit. If
you use L-Edit, you’ll find that you can delete the pin layers (or move them out of the way,
further down in your layers list), as Virtuoso requires separate layers for pins, L-Edit does not.
If you import a setup directly, you can change the colors from the Virtuoso stipple patterns to
L-Edit patterns, which are much more flexible; allowing layers to be rendered directly, added
to or subtracted from other intersecting layers, and rendered in any sequence. This very much
improves the look of the layout and allows multiple layers to be visible without confusion.
Once a Dracula or Calibre file has been loaded, the text editor within L-Edit can allow
editing of the files while showing syntax highlighting. This allows new rules to be checked for
correctness prior to running. Through the L-Edit editor, layer derivation can be displayed
easily, and a single rule check can be executed from the editor, instead of having to run the
entire set.

Extract
The extract tool (EXT) is used to turn your layout into a netlist of transistors, resistors, and
capacitors. Remarkably, the process is quite easy from the user’s standpoint, all you need is
an extract definition file that instructs the EXT tool to recognize devices and how they are
interconnected.
An extract definition file (.ext) is a simple text file that declares how layers are
interconnected, such as via connecting M1 and M2, or P diffusion (not in an N well)
contacting substrate. The rules are very simple for such interconnections.
Devices are extracted by the use of derived (or sometimes drawn) device recognition layers.
When EXT is run, the tool looks for objects on the recognition layers, determines what the
device and its parameters are and writes them to an output file, along with interconnection
information. For example, you may establish a derived layer called NTRAN, which

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 69

may be the coincidence of active area, N implant, and POLY, for the purpose of identifying
NMOS transistors. You’ll also have to identify the source and drain regions of the devices,
so you may need a layer called, say, NSD, which is any N implanted active area that is not
covered by POLY. The definition for the NMOS transistor would declare the recognition layer
as NTRAN, the gate as POLY, and the source and drain regions as adjoining NSD areas.
Resistors can be tricky, as you may want a drawn layer to identify the resistor areas, and in
the case of poly resistors, you’ll create a layer called, say, POLYWIRE to be all POLY that
isn’t declared a resistor, and you’ll make all of your resistor connections (and in this case
transistor gates too) using POLYWIRE. A resistor layer, say, POLYRES, which is derived
from POLY and the resistor definition layer, connects to POLY-WIRE at each end. If you try
to connect to the resistor using the POLY layer itself, the resistor will be shorted out. The
sheet resistance specified for the layer POLYRES will be used in the resistor’s value
calculation.
Alternatively, you could draw resistors in a drawn layer called POLYRES (with the same
GDSII number as POLY), where POLY comes to a stop at the ends of POLYRES structures.
This may be useful, but you’ll probably want POLYRES to be a new color, similar to POLY.
The former technique, although a bit more complicated, is preferred, as the POLY layer is left
intact as a correct and complete mask layer.
Now we see how the layer list, DRC, and EXT all become intertwined in developing a layer
list and rules. This is a strategy that you’ll have to live with as you design; so the choices
you make initially will impact the way you work throughout the project. Identify just what
features you want in terms of resistors and capacitors, and structure the layer list and rules in a
minimal fashion. A few trial layouts and extracts will show where any errors in thinking may
remain; most extract files can be put together in a short time, once the derived layers are
properly organized.
If you have imported a Virtuoso layer setup, you won’t have the derived layers required
for extract, and will have to derive them yourself. Also, you’ll have to create your own
extract file. A typical (although simple) extract file would look like this:
#SAMPLE .ext FILE
connect(nwellwire, ndiff_sd, ndiff_sd)
connect(subs, pdiff_sd, pdiff_sd)
connect(allsubs, subs, subs)
connect(ndiff_sd, M1, CNT)
connect(pdiff_sd, M1, CNT)
connect(polywire, M1, poly1contact)
connect(polycapwire, M1, poly2contact)
connect(M1, M2, VIA)

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 70

# NMOS transistor
device = MOSFET(
RLAYER=ntran;
Drain=ndiff_sd, WIDTH, area, perimeter;
Gate=polywire;
Source=ndiff_sd, WIDTH, area, perimeter;
Bulk=subs;
MODEL=NMOS;
)
# PMOS transistor
device = MOSFET(
RLAYER=ptran;
Drain=pdiff_sd, WIDTH, area, perimeter;
Gate=polywire;
Source=pdiff_sd, WIDTH, area, perimeter;
Bulk=nwellwire;
MODEL=PMOS;
)
# Poly capacitor
device = CAP(
RLAYER=polycapacitor;
Plus=polycapwire;
Minus=polywire;
MODEL=;
)
# Poly resistor
device = RES(
RLAYER=poly resistor;
Plus=polywire, WIDTH;
Minus=polywire, WIDTH;
MODEL=;
)
#

Your extract definition file need only include those devices you are actually using, which
may be no more than NMOS, PMOS, POLYRES, and POLYCAP, but other devices can be
included as desired. Although you can have the ability to extract every kind of possible
resistor, N diffusion, P diffusion, NWELL, POLY1, POLY2, or metal resistors, usually only
one or two resistor types are needed; the EXT file becomes smaller, and extract completes
more rapidly.
Extract will find capacitors that you have identified in your layout as capacitors, so that they
may be matched against the capacitors in your schematic. However, the extract tool can be set
to add parasitic capacitances from metal and poly lines to substrate, ones that aren’t in your
schematic.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 71

You’ll end up running extract once without parasitics for LVS purposes, then once again
with parasitics turned on, to do SPICE simulation. The parasitic capacitances of MOS
diffusions are always an intrinsic part of the MOS models, and do not require special attention.
Unfortunately, metal parasitics are only calculated to substrate, not to each other. If you
want an accurate accounting of the capacitance between metal layers, you’ll have to identify
those areas and build a separate extract file to find them and place them into your output
netlist. Since these stray capacitances aren’t in your schematic, you can’t use this EXT file
for LVS purposes, but you’ll be able to identify intermetal parasitics. It is my experience that
this is not necessary, and can cause more trouble than it is worth; if you are concerned about
the effects of such coupling, add a capacitance manually into the extracted file prior to SPICE
simulation. Don’t get carried away with the trivial details, they will bury you. Pay close
attention to what you know really matters in the design, let the rest go.
The really neat value of extract is that you can draw a layout, extract it to a netlist, and
perform SPICE simulation directly, without even a schematic existing. Often I will have a
schematic so that I can do LVS, but will make adjustments at the layout level (trimming
resistors, scaling transistors) until the circuit performs the way I want, and only then, after the
circuit works well and fits in a tight space, will I go back and maybe make notes of the final
device sizes and resistor values in the schematic. The layout is where the real details are, not
the schematic. Since the EXT tool can analyze metal capacitance to substrate, and
automatically determine source and drain areas that can constitute loading in dynamic circuits,
the simulation of the layout will always be more accurate than a schematic of assumed values.
Traditionally (hmm, does that word have meaning anymore?), a design engineer with
knowledge of circuit design and analysis would draw a schematic and do full SPICE
simulation with precise device parameters, then throw the result “over the wall” to a layout
person that would otherwise be called a “polygon pusher.” The polygon pusher was of
lower engineering status (gets paid less), and would dutifully attempt to lay out the design that
was presented to him. I think this is nice and tidy, but basically flawed, because anyone
trained in circuit design can certainly learn to draw using a layout tool, and he gets to actually
see the result of his (sometimes silly) design choices as he goes along. I strongly encourage
polygon pushers to become engineers and engineers to become polygon pushers. This cross-
training exercise will undoubtedly lead to better, denser, and more synergistic designs.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 72

Layout vs. Schematic (LVS)


Layout versus schematic (LVS) simply compares the netlist from the schematic tool with that
obtained from the EXT tool. If the two are identical the tool will indicate this very clearly
(circuits are equal), but if they’re different, LVS will give clues as to what’s wrong, but
rarely will it directly point to the problem. Interpreting LVS results is a Zen kind of thing.
The Virtuoso tool compares the netlists to connection pins and names that have been
attached within the schematic and layout, whereas the Tanner tool will do a topological
matching. There are advantages and disadvantages to each. If the tool matches pin names of
the schematic and layout, then all relevant pins must be named, which I always disliked when
working with Virtuoso. Alternatively, you don’t have to name anything in the Tanner
environment, the tool will simply see that the two netlists are topologically equivalent and
declare them to be identical even if the pin names don’t match. Yes, one can go to an LVS
output file and see that certain pins didn’t match properly; and yes, when the circuit block is
inserted into a larger block, such mismatching will ultimately become evident, but I prefer
knowing that the pins do match. The Tanner tool allows for this by accepting an optional
prematch file, a text file that indicates schematic, and equivalent layout pin names (which can
then be different), but you have to write this. Logic decoders can really suffer from this kind
of mixed-up-pin problem. When layouts don’t match the schematic, you’ll be amazed at
the extent of LVS confusion that can result from a single connection being wrong. It’s like
the floodgates of computer hell burst open and flooded your screen with meaningless
information; you’ll think everything’s wrong, but it usually comes down to just one or two
connections. Don’t despair; somewhere in that knotted ball of twine you’ll find a loose end
to tug on, and once again, all will be right with the world. The secret to LVS is perseverance.

The SPICE Tool


There’s a free SPICE program (Simulation Program with Integrated Circuit Emphasis)
offered by Mike Smith, called WinSpice3, that will work but it offers some slight irritation to
the user unless a small fee is paid ($25 at last check). For quick circuit development, trying out
new ideas to get rough results, it is excellent. The graphical interface is not as complete as
other SPICE packages though, and for a better package, be prepared to pay in the range of
$25K. The Tanner tool set comes with a good SPICE program and a very nice viewer that
allows cursor measurements and screen updating as the program proceeds and its cost is a
fraction of the higher priced tools.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 73

Although I like Tanner SPICE (Tspice), I often go back to WinSpice3 since it is a bit easier
to use (less formal), and produces quicker results for very simple circuits (filters, oscillators,
standard cells). For PLLs and very complicated circuits, I always use Tspice. If you’re just
getting into IC design, I strongly suggest getting WinSpice3 first, where you can get your
“feet wet” without diving into the deep end. SPICE is one of the great inventions of the
latter half of the twentieth century, as you’ll see when you simulate your first circuit. It is
positively unimaginable how one could design anything today without it. Even if you don’t
get into IC design, at least get SPICE.
The beauty of SPICE is that it deals with both the dynamics of a circuit and the abundant
nonlinearities associated with semiconducting devices. It only responds to what you give it
though, so it won’t properly model the resistance of a wire unless you tell it to, by inserting a
resistor in place of the wire. To SPICE, wires have zero resistance, as do inductors, and
capacitors are lossless. This can lead to unrealistic results (infinite Q) that must be understood.
Whenever you model a circuit in SPICE that involves inductors and capacitors, you must add
losses to obtain realistic results.
SPICE can analyze circuits as to DC, AC, and transient response, as well as show variations
in circuit performance over temperature. The program will allow a wide range of input signals,
even bit patterns, and the results of a transient response can even be subjected to Fourier
analysis within the SPICE program. Usually, you’ll be looking for the temperature stability
of a bandgap reference, the propagation delay of a standard cell function, or the gain or offset
of an op-amp, but many different analyses can be performed.
When dealing with active devices, such as bipolars or MOSFETs, a model is required to
which SPICE can refer. MOS models are simple text files of cryptic parameter names and
numerical values, and come in various flavors. Your foundry will generally supply only one or
two types. MOS modeling is quite complex, and hopefully you’ll never need to understand
each parameter, but when designing unusual devices, you may find the need to “dig into”
either the model itself, or the SPICE simulator, to set options that would give ridiculous results
if left unchanged.
As an example, the extraction tool will print MOS devices into the netlist with gate width
and length, and the areas and perimeter values of source and drain regions. There is a
parameter in many models named RSH that refers to the diffusion resistance between
source/drain contacts and the gate of the device, and if actually specified in the extracted MOS
declaration as to how many squares of such resistance are found (which would be unusual),
the calculations will work out correctly. Unfortunately, the default for the number of squares
in WinSpice3 is 1.0, and in Tspice is 0.0. If you’re constructing a very strong MOSFET,

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 74

perhaps L = 1 μ, W = 10,000 μ, the simulation will be correct in Tspice, but wildly incorrect in
WinSpice3. The former correctly shows a device that can conduct amperes, while the latter
will show a device with RSH values in series with both source and drain, and with RSH = 50
Ω, maybe 50 mA will be displayed as a maximum drain current. This is just one example of
how the setup of various options in the SPICE environment can interact with the SPICE model
parameters to give seriously misleading results.
I don’t suggest fully digesting every aspect of your models, but I do suggest gaining a
sense for what is reasonable and what is not, then tracking down odd results to their source.
I can’t go into a complete description of all the SPICE features here, but I can offer some
encouragement to those that have no experience with SPICE: Any simple circuit, such as an
op-amp, will have inputs and an output, a supply connection and a ground connection, but
internally, there may be as few as three or four other nodes. SPICE allows circuits to be
described by naming or numbering nodes, then using the node names as connections to
devices. The complete SPICE “deck” (yes, from computer punch card days), may look like
this:
*Simple amplifier
.include mos.mod
vp 50 0 5v
vmid mid 0 2.5v
vin in 0 dc=1v
vb bias 0 1.2v
M1 2 bias 0 0 nmos l=1u w=10u
M2 3 in 2 0 nmos l=1u w=20u
M3 4 mid 2 0 nmos l=1u w=20u
M4 3 3 50 50 pmos l=1u w=10u
M5 4 3 50 50 pmos l=1u w=10u
M6 out 4 50 50 pmos l=1u w=20u
M7 out bias 0 0 nmos l=1u w=10u
.dc vin 2.4 2.6 100u
.plot dc v(out)
.plot dc i(vp)
.end

That’s it. We have the first line as a comment area, we declare a model for the MOS
devices we’re using, we establish a power supply voltage, a mid-supply voltage, an input
signal, and a bias voltage, we connect seven transistors (pin order: drain, gate, source, body),
and run the simulation, sweeping our input from 2.4 to 2.6 V in 100-μV increments, and get
out both the output terminal voltage and the power supply current as a function of the swept
input. Further, the line entries can be in any

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 75

order (except the first and last lines), and the terminal names can be whatever you like. This is
really simple.

The Logic Simulator


SPICE carefully analyzes every tiny detail of circuit operation, and in dynamic simulations,
there are a huge number of calculations to perform at each instant, precisely balancing the
voltages at each node and the currents flowing between them. Therefore, the simulation of
large logic blocks is possible, but very inconvenient with SPICE. In order to perform fast logic
simulation, the individual logic blocks, (such as gates and flip-flops) are abstracted into
behavioral models where only 0 and 1 (ground and supply) are valid signal voltages. The
behavioral model defines the logic function, the function’s input and output terminals, and
the propagation delay between inputs and outputs. The logic simulator drives the netlisted
circuit with the logic signals you prepare, and only does calculations when a signal transition
is propagating through the circuit. The speed improvement over SPICE is extreme, in the order
of millions.
I’ve found the most universally used simulator for simple circuits in the sandbox range to
be ModelSim, a product now offered by Mentor Graphics, which is priced in the $5000 range.
It comes in two flavors, either Verilog or VHDL. These are languages used in defining
circuitry and stimuli: languages that are peculiar, simply because the processes they control
are event driven, as opposed to the linear, step-by-step languages that we use for general
programming. I use the Verilog version of ModelSim.
Getting used to Verilog can be time consuming, but there are many brief books on the
subject, and code can be found here and there to act as examples. Behavioral models are fairly
easy to construct, and once you get the hang of it, your entire standard cell library can be
modeled in an hour or so. Writing a stimulus testbench can be a challenge, but since
simulation is so fast, you can quickly see what various commands are doing, and modify your
code to suit.
The biggest problem with logic simulation is getting the circuit to start in a known
condition. Logic simulators cannot propagate a known output from a circuit if the logic inputs
are not defined, so flip-flops need to start up in a known condition; this often requires that you
use reset flip-flops in your design, even in instances where full resets may not be required
from a functional point of view. Alternatively, Verilog can set or clear flip-flops by forcing
them, which means you’ll need to get a deeper grasp on the Verilog language to do so. In
any case, if your circuit includes more than 100 gates, and especially long counters, you’ll
want to perform logic simulation. My purchase of ModelSim abruptly ended an embarrassing
sequence of first silicon failures because

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 76

I thought that my logic circuits were too small, simple and obvious to require simulation.
Making any logic circuit without logic simulation (or SPICE simulation if that is possible), is
downright hubristic, and you will pay for the arrogance.
Further, when it comes to establishing a test routine for your chip, you’ll need to obtain a
file that reflects the exact logic output expected, at exact moments in time, from a test stimulus
file that you provide to the test facility. This is only done through the use of a logic simulator.
The logic simulation flow is quite simple: You attach attributes to each of your standard cell
schematic symbols that declare which behavioral model applies, output a Verilog netlist from
the schematic tool, create a testbench file of stimuli, include references to your behavioral
model files, allow ModelSim to compile the files into a form that can be simulated, and run.
By the way, ModelSim has a very handy feature, where bus signals with arithmetic values can
be displayed as analog waveforms, which is great when dealing with signal processors.

Place and Route Tools


The ability of a program to accept a netlist and accordingly place standard cells and route them
into a complete layout is a wonderful time saver, while often being a real space waster. If your
logic is a jumble of gates, as would be produced from a logic synthesis tool, you will not want
to place cells and wire them by hand, unless the circuit is trivially small. Further, although
your tools may include a means by which cells can be placed into a layout from the schematic,
allowing you to interconnect them as you wish with fly wire guidance, logic circuits beyond,
maybe, a hundred gates will become difficult to manage.
I strongly suggest that since the bulk of logic circuitry is regular and ordered, like
multipliers or register banks, one should consider the layout before even drawing a schematic,
to find ways in which circuit blocks can be structured as arrays of simple cells, as opposed to a
rat’s nest of jumbled logic. The former is compact with predictable wire loading; the latter is
inefficient from every point of view but that of design time. If you need it quickly, autoroute
it, if you need it dense, do it in arrays by hand.
There are, however, circuits that just can’t be reduced to nice and tidy blocks of compact
and regular cells, and these need to be hand routed or autorouted. If you don’t have an
autorouter, you must try to find one; they are hideously expensive. My attraction to the Tanner
tool set goes beyond my appreciation of L-Edit, for the SPR package that works with L-Edit is
extremely valuable. Not without difficulties though; I have a long list of complaints, but the
ability to automatically place and wire up an absolute mess of logic in a very short period of
time is the key to building complex ICs, and SPR is really affordable.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 77

My concerns for the inefficiency of Tanner’s SPR, and some of the unbelievably
“goofy” paths that it produces between cell rows, not to speak of some of the arcane work-
arounds that are required to get it to perform well has driven me to shop for a package that
would do a better job. Wow, what an eye-opener! I’ve had salespeople tell me (in a very
serious tone of voice) that their autorouter would produce dense layouts in multiple layers of
metal, and only cost $500,000 to rent for a year. Other quotes were over a million dollars. Of
course, I’d have to run the software on a UNIX workstation. Right, and I’ll just park the
software package between my Ferrari and my Lamborghini.
The Tanner SPR autoroute tool produces a standard delay format (SDF) output file. The
SDF file is derived using a Liberty file (.lib) that describes the drive capability and input
loading of your standard cells during the autoroute process. The SDF file is associated with
your logic netlist during simulation, whereupon a more accurate logic simulation can be
performed with gate and wire loading, as well as wire resistance being taken into account. It is
very difficult to extract such information from a hand layout, which needs to be analyzed more
carefully, perhaps using SPICE to find critical delays that would not be revealed during logic
simulation.

Logic Synthesis Tools


In the FPGA environment, circuitry is composed of blocks that can be routed by programmed
wiring. The blocks are not standard logic gates, but unique combinations of logic circuits
organized in blocks that can offer a high degree of flexibility to a wide range of customers.
The programming interface is a logic synthesis tool that accepts a circuit function defined in a
register transfer level (RTL) language (a subset of the Verilog language) and produces the
proper programming to make the FPGA perform the desired function. Such FPGA synthesis
packages are specifically limited to the target FPGA, and are offered at low cost as an
incentive for users to design in the FPGA environment.
Logic synthesis tools for use with an arbitrary standard cell library are more expensive (like
LeonardoSpectrum from Mentor, costing in the $20K range), but you may have difficulty
getting your custom-designed cells defined in a form that would make sense to the synthesis
engine, unless you spend another $10K for a compiler. Oh, and that’s the cheapest
synthesizer I could find; others are priced like a mansion on Maui. If you are familiar with
RTL circuit description and have a lot of logic to include in your designs, LeonardoSpectrum
may be a good investment. I prefer the schematic entry method.
I’m told that the majority of ICs today are designed by the synthesis-autoroute method,
where schematics are a thing of the past.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 78

This completely amazes me; maybe because I’m particularly quick with knocking out those
old-fashioned schematics that read clearly and assemble into well-compacted layout blocks. I
very much prefer handcrafting both schematics and layouts, and end up with very dense and
fast silicon.
I recall a time when one of my younger engineers was working on the design of a digital
servo system, with a few asynchronous signals coming in, a few well-calculated control
signals going out, and lots of logical stuff in between. Every time he ran into a timing
problem, he added another layer of pipelining to his multibit datapaths, and when I reviewed
his work mid-project, it was clear that he was getting lost. The solution came down to a few
flip-flops strategically placed to deal with the asynchronous conditions. My feeling today is
that armed with a logic synthesizer, he could have produced a monstrous circuit that would
work, but be huge, and no one, looking at the mass of gates, would be able to realize that it
could have been done far more efficiently.
Logic synthesizers can reduce small logic patches beautifully, but when confronted with
something complicated, like a multiplier, they stumble. In cases like this, the synthesizer
program is trained to look for multiplier functions and substitute predefined structures.
Nonetheless, if you have a tight schedule to meet, and don’t mind that the resulting logic is 4
to 10 times the size of a well thought out and crafted hand layout, synthesis and full autoroute
may be a good choice.
If you want to fit 2 quarts in a 1 pint bottle, however, I suggest planning your projects so
that they consist of tight arrays of easily arrayed cells and interface to small autorouted blocks
of the more random logic. Additional design time worth $20,000 can save $100,000 in mask
costs, or cut your die cost in half.
Through careful planning and clever design, you may find that you need fewer design tools,
can fab on a cheaper process with lower mask costs, and deliver a part that does more than one
that’s slammed together in a rush to market.
A final note on tools: When thinking through a possible arrangement of circuits, insightful
analysis is often beyond the capability of SPICE or ModelSim. C may be the best
programming tool for modeling wild ideas, but I find that the most useful “thinking” tool is
a free program called YABASIC. Windows systems are very complicated, so the days of
simple BASIC programs that one could write in a matter of minutes and get nice graphs from
are gone, except for this easy-to-use program. Since computers are so fast today, you can write
a discrete Fourier transform and get a graphic result from a data file in minutes of
programming time. As an interpreted language, execution is fairly slow, but when graphic
results are required, the short programming time is well worth the wait during execution. The
fast Fourier transform is nice, but it’s twisted, while the discrete version is intuitive and
obvious. A 64K point DFT may

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 79

take minutes to execute but also only minutes to write. Further, you can quickly vary each
analysis to suit your needs until you get what you’re looking for.
I’ve used YABASIC to analyze all kinds of things, from lens systems to predicting the
behavior of mechanical systems; if you have a nutty notion that can be modeled
mathematically, light up YABASIC and get deeper insight in no time. This can be done in
C++, but it’s three times the effort, and in these cases you’re looking for a deeper
understanding of your idea, not another lesson in C programming.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
DESIGN TOOLS Keith Barr 80

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
Source : ASIC Design in the Silicon Sandbox Keith Barr 81

Standard Cell Design


Once you have your layers list in place, either imported from a foundry directly or
through your own hard effort, it’s time to start making the cells you will use in
hierarchical fashion throughout your IC’s layout. I draw on a black background, but for
printing images in a book, a white background is easier to print and read clearly, so the
designs you see here may look different from the ones you will want to see on your
computer screen. I strongly suggest using a black background, which is easy to produce;
if your background is currently white, go to your colors palette and change black to white
and white to black. It’s that simple.
For the purpose of conveying an understanding of how to design, we will need a list of
layers, colors, and rules, and since foundries forbid the open disclosure of their rules, I’
ll make up an imaginary set of rules. If you would like, you can send your imaginary
GDSII files to the Sandbox Foundry, they will manufacture imaginary wafers for you, for
which they happily accept imaginary cash.
The purpose of this exercise is to show where problems can occur early on, so that
you’re not weeks into a design only to find that you have to start all over again. I’ve
started over several times, so this is a subject I can write about with great authority.
The rule set for this process is intentionally simplified. Normally, cryptic but simple
names are given to the rules, which is convenient when printing up a “cheat sheet” that
shows them in a layout. I’ll try to make my names clear enough to not require rule
drawings.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 82

The Sandbox Rule Set


Minimum Well width 3.0 μ
Minimum Well spacing 4.0 μ
Minimum act width 0.7 μ
Minimum act spacing 0.8 μ
Minimum Well surround of Nact 0.2 μ
Minimum Well space to Pact (outside Well) 0.2 μ
Minimum Well surround of Pact 1.6 μ
Minimum Well space to Nact (outside Well) 1.6 μ
Minimum Nimp surround of Nact 0.3 μ
Minimum Pimp surround of Pact 0.3 μ
Minimum Nimp width 0.8 μ
Minimum Nimp space 0.8 μ
Minimum Pimp width 0.8 μ
Minimum Pimp space 0.8 μ
Overlap of Pimp by Nimp not allowed
Active without Pimp or Nimp not allowed
Minimum Poly width 0.6 μ
Minimum Poly space 0.8 μ
Minimum extension of Poly beyond Act 0.6 μ
Minimum extension of Act beyond Poly 0.8 μ
Minimum space field Poly to Act 0.2 μ
Exact size of CNT 0.6 μ
Minimum CNT space 0.6 μ
Minimum Act CNT to Poly space 0.4 μ
Minimum surround of CNT by Act 0.3 μ
Minimum surround of CNT by Poly 0.2 μ
Poly CNT over Act not allowed
Minimum Poly CNT to Act space 0.4 μ
Minimum M1 width 0.6 μ
Minimum M1 space 0.8 μ
Minimum M1 surround of CNT 0.2 μ
Exact VIA size 0.6 μ
VIA space 0.6 μ
Minimum M1 surround of VIA 0.3 μ
Minimum VIA to CNT space 0.6 μ
VIA on CNT not allowed
Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).
Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 83

Minimum M2 width 0.8 μ


Minimum M2 space 1.0 μ
Minimum M2 surround of VIA 0.2 μ
Minimum PAD dimension 15.0 μ
Minimum M1 surround of PAD 4.0 μ
Minimum M2 surround of PAD 4.0 μ
Minimum VIA surround of PAD 1.0 μ

Note: Exact size VIA not required under PAD openings.

Minimum Poly2 width 2.0 μ


Minimum Poly2 space 2.0 μ
Minimum Poly2 to Poly CNT space 1.0 μ
Minimum Poly2 surround of CNT 0.8 μ
Minimum enclosure of Poly2 by Poly for Cap 1.5 μ
Process manufacturing grid 0.05 μ

This is a simple process, dual-metal, dual-poly, no high value resistors, but well integrated,
so we may make nice layouts easily. This is not so common in the real world; many processes
are poorly integrated, but perhaps a bit tighter in their rules. Tight rules may allow high
density in specific cases, but often can’t be fully realized in practical layouts due to other
rules that cause conflicts. A well-integrated process takes drawing convenience into
consideration; in such a case, cursor snap dimensions are large and everything fits neatly. This
process doesn’t have high-value resistors available, nor does it allow stacked vias, but vias
can be placed without regard to POLY locations. This is not always the case in commercial
0.6-μ processes.

Organization of the Design Rules


Let’s go through the rules briefly, to gain an appreciation for why they exist. Width and
spacing rules make sure that you don’t draw an object too narrow or too close to another
feature on that layer. These rules are determined by the photolithographic process that images
onto the wafer and the etching process that removes unwanted material. If two objects on the
same layer come too close, they will merge into a single object, which basically means two
objects shorted together. If the layer is too narrow, it either won’t function as expected, or in
the case of poly and metal runs, they may cease to exist.
The Well surround of Nact rule recognizes that the Nwell is contacted by N diffusion (Nact
= Act and Nimp), and that the Nwell must surround the contact by this minimal amount to
ensure that good contact is made.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 84

The Well space to Pact rule is similar; this controls how far the well must be from substrate
contacts.
The Well surround of Pact and the Well spacing to Nact rules both make sure that PMOS
devices are far enough inside the Nwell to work properly, and that the NMOS devices are far
enough outside the Nwell. Since the well is driven into the substrate deeply by high
temperature diffusion, it diffuses out from its drawn dimension somewhat, and is poorly
defined at its edges.
Nimp and Pimp surround of Act (active area) rules make sure that if the implant mask is
slightly misaligned in relation to Act, that all of active will eventually get the proper implant.
The Minimum extension of Poly out of active is also called an endcap rule; it is important
that the gate poly fully overlap the end of active, so that it can completely cut off conduction
between source and drain, despite slight misalignment of the masks during imaging. The
Minimum extension of Act. beyond Poly rule makes sure that source and drain regions are wide
enough to carry currents reliably.
Contacts are of an exact size so that they can be filled uniformly during isotropic metal
deposition, which leaves a fairly flat surface. Larger contact holes would cause a deep
depression in the finished metal surface. This restriction on exact contact dimension should
not necessarily apply to the seal ring, where often the contact can be drawn as a stripe,
provided the stripe is of exact width.
The Minimum Act CNT to Poly rule makes sure that when the contact holes are patterned
and etched, poly is not touched while making a contact to source or drain. This would
constitute a short between gate and source or drain of the MOSFET.
The surround rules make sure that despite mask misalignment, the structures can be
functionally produced.
The Minimum Poly CNT to Act space makes sure that the metal contacting Poly does not get
too close to the active area; also, it is forbidden to place contacts onto poly if it is above the
active area. This is because the metal will diffuse into the polysilicon locally, and often
“spike” through the thin oxide and short to the MOSFET channel.
In this process, Poly2 is deposited after the first poly layer, so Poly2 will be the top layer of
a capacitor. Poly2 is usually thicker than Poly and has a greater spacing to Poly than Poly does
to active area, so spiking is not a problem when contacting Poly2 in a capacitor structure.
Finally, the “process grid” is 0.05 μ. This is the database unit that will be used by the fab
in making the masks. All objects should be drawn such that they lie on this grid or multiples of
it. Your mouse snap should always be set to some multiple of the manufacturing grid. Circles
and arcs should be avoided in your design, but if included, will be turned into stepped curves
on the manufacturing grid. If your design contains

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 85

objects that are not on this precise manufacturing grid, the mask maker will “force” them to
the nearest grid increment, which could cause problems. Make sure all edges and corners of
your objects are on the manufacturing grid.
You might notice that since all active areas need covering by either Nimp or Pimp, only one
needs to be drawn, the other being a derived layer, simply defined as “not” the first layer.
We could take this approach, but we would have to instruct the tools to generate the second
layer before we export the GDSII file. When generating GDSII layers at the end of a design,
the derived layers no longer have the benefit of hierarchy, so, objects from derivation will
exist as many objects on the top level. This will create a large output database, and may cause
difficulty when using with other tools (on account of the database size). For this discussion,
we’ll draw both layers.
The basic layers have resistance characteristics:

M1 sheet resistivity 0.05 Ω/sq


M2 sheet resistivity 0.03 Ω/sq
Poly sheet resistivity 40 Ω/sq
N+ sheet resistivity 120 Ω/sq
P+ sheet resistivity 160 Ω/sq

The poly-poly cap value is 1.0 fF/μ2.


The gate capacitance is 2.4 fF/μ2.

From the list of rules, it looks like we’ll have to derive layers for DRC checks and extract
files, and we’ll need to include the Sandbox Foundry GDSII numbers. Also, drawn layers
will need colors and some patterns.

Drawn and Derived Layers


Well Drawn GDS#1 DkGrn (light)
Act Drawn GDS#2 LtGrn
Poly Drawn GDS#3 Red
Poly2 Drawn GDS#25 Orn
Nimp Drawn GDS#4 Grn (light)
Pimp Drawn GDS#5 Purp (light)
CNT Drawn GDS#6 Gry
M1 Drawn GDS#7 Blu
VIA Drawn GDS#8 Orn
M2 Drawn GDS#9 Yel

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 86

PAD Drawn GDS#10 Blk outline


RES Drawn Gry outline
ICON/OUTLINE Drawn Yel outline
SUBCKTID Drawn Red outline
NOTES Drawn Wht
Subs Derived as: not Well
Nact Derived as: Nimp and Act
Pact Derived as: Pimp and Act
PolyWire Derived as: Poly and not RES
PolyRes Derived as: Poly and RES
Ntran Derived as: Nact and PolyWire
Ptran Derived as: Pact and PolyWire
Ndiff Derived as: Nact and not PolyWire
Pdiff Derived as: Pact and not PolyWire
CAP Derived as: PolyWire and Poly2
PolyCon Derived as: CNT and PolyWire not Poly2
Poly2Con Derived as: CNT and Poly2
ActNoImp Derived as: Act and not Nimp and not Pimp
PimpAndNimp Derived as: Pimp and Nimp
PolyConAndAct Derived as: PolyCon and Act

The derived layers will allow extract to find and interconnect devices, and certain DRC
checks will need derived layers like PolyCon, ActNoImp, and PimpAndNimp.
Thankfully, The Sandbox Foundry does not specify wide metal rules.
The first thing we’ll do is set our drawing snap to 0.1 μ, because after looking through the
rules, a coarser setting would not do. Our layers will look like Figure C5.1 in the color section.

Simple Cells
Now, we’ll draw a few cells that will help greatly in the future. The first four cells in Figure
C5.2 (color section) make all of the possible connections to diffusions or poly, the last one
connects M1 to M2. PCON is an exact-sized contact, with a minimum surround of M1, a
minimum surround of Act, and a minimum of Pimp surrounding Act. The same goes for
NCON, but the Nimp layer is used. PYCON is simply a contact with minimum M1 and Poly,
same with PY2CON, using Poly2 instead, and the VIA is an exact-sized via surrounded by
minimum M1 and M2. With this set of cells, you should never draw a contact or via again,
with the possible exception of bonding pads and peripheral rings.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 87

When drawing the cells, set the origin to the lower left corner of the contact/via, or at its
center. The choice you make will depend on the rule set. If all objects, after applying the rules,
are an even number of grid snaps across, it doesn’t matter. However, imagine if the VIA
dimension was 0.7 μ instead of 0.6 μ; in this case, we may want to origin the VIA at it’s
center, using a 0.05-μ snap (just for drawing that cell), so that later, our vias would be able to
be aligned with our contacts. We will need this feature later when we draw compact
transistors. This is only one example of how reading the rules and imagining how your design
will go will reduce potential agony down the road.

Standard Cell Design Issues


The issues involved in standard cell design are numerous and intertwined. Unfortunately, the
following will jump around a bit, from one subject to another but trust me, these issues are all
related in the end.
Using these cells, let’s draw an inverter (Figure 5.1), the layout of which is found in
Figure C5.3 of the color section.
A is the input, Y is the output. When A is at ground potential (0) the PMOS device is on,
the NMOS is off, and Y is pulled to the supply. When A is at VDD, the NMOS device is on,
the PMOS off, and Y is pulled to ground. Most logic functions, are inverting in nature. The
top and bottom M1 stripes are called the “rails,” and in this case are 3-μ high. The entire
cell is 14-μ high, from the bottom of the ground rail to the top of the VDD rail. The Nimp and
Pimp layers have been drawn around the MOSFETs, and extend a bit further on the sides than
the obligatory 0.3 μ defined in the rules, because the cell is intended to be abutted to

Figure 5.1 Basic inverter schematic and symbol.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 88

cells to the left and right, without violating any rules. The rule that we must abide by in this
case is the Act spacing rule of 0.8 μ, so each side of the cell is extended by 0.4 μ past the Act
layer of the MOSFETs. The Act strip across the bottom is P implanted to connect GND to the
substrate, as the strip across the top is N implanted to connect the well to VDD. These are
contacted in such a way that abutting cells will not violate any rules. Such contacts to well and
substrate are required in every standard cell, even if the connection is through a minimum-
sized diffusion feature.
Also notice that the input and output connections are made with the VIA cell, as signal
wiring to the standard cell will be exclusively through M2. In an autoroute using this two
metal process, M1 will be used as horizontal stripes running between cell rows in wiring
channels (or wiring bays), and M2 will be placed vertically to connect cells to those M1
stripes.
The N device has a width of 2.9 μ and the P device has a width of 3.9 μ, which makes the
cell pull down better than it can pull up. A well-balanced inverter would have a P/N width
ratio of perhaps 3:1, but this is inefficient. The ratio 2:1 would be a reasonable trade-off, but
our cell height will have to grow to accomplish this, as the N device can be made narrower,
but not while reducing cell height.
Sweeping the input from ground to supply, we see the inverter’s DC response in SPICE,
in Figure 5.2.

Figure 5.2 DC response of inverter.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 89

We see that the inverter has significant gain over an input range of approximately 2.0 to 2.3
V. Beyond this range, gain is reduced dramatically. We could consider the input threshold
voltage to be 2.15 V, which is perhaps not ideal; a better balanced inverter would have a larger
P/N width ratio, and a threshold voltage of VDD/2, or 2.5 V. To achieve this we will need to
nearly double the width of the P device. Increasing the P device width, however, will increase
the capacitive input loading of the cell.
CMOS processes can be characterized by the ring oscillator, an odd number of inverters in a
loop. The period of oscillation is divided by twice the number of inverters to obtain an inverter
delay value.
When simulations of inverters with various P/N width ratios are performed, it is found that a
broad minimum occurs in terms of propagation delay. This is because CMOS is basically
inverting in nature; a signal propagating through a logic network will alternately be delayed by
an N device driving a load, then a P device driving another load. Increasing the P width within
all cells improves the rise time out of a cell because of greater P device strength, but slows
falls times due to increased cell input load capacitance. Therefore, a somewhat imbalanced
P/N drive current ratio is acceptable. Figure 5.3 shows a plot of inverter delay times as a
function of N and P size ratio. Certain cells, such as the NOR function, will suffer from many
P devices in series, which makes the rise time out of such cells very long. Attempts to resolve
this problem through cell design can be futile; it is better to accept the

Figure 5.3 Plot of propagation delay through an inverter by allowing a series of odd-
numbered inverters to self-oscillate.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 90

fact, and use NAND cells preferentially over NOR cells, and take care not to overload NOR
functions.
The inverter will draw supply current when transitioning from one logic state to the other.
Figure 5.4 shows a DC plot of supply current as a function of input voltage for the inverter.
When the input voltage is at full supply, the N device is on, the output is low, and the P
device is off. The exact opposite happens when the input is grounded. When the input of a
logic function is other than these two logical extremes, current is drawn from the supply,
because both devices are somewhat “on” at the same time. It is imperative that such quasi-
logic levels not be allowed in digital circuits, or excessive supply current will result. Also,
slow logic signal transitions into logic functions will cause overall chip current consumption
to rise, as devices are “fighting” each other through the duration of a logic transition. Low
fanout designs run faster and due to fast transitions, leave logic cells in the “quasi” state for
a shorter period of time.
When we build a library of standard cells, we want them such that they can all be abutted in
any combination, so the abutting features like the well and the implant masks should align on
all cells in the library. Since we haven’t built the more complicated functions yet, we still
don’t know if 14 μ is a reasonable cell height. If we start building more cells, at some point
we may run into a situation that will demand the cell height to be increased or the well edge
position to be changed, and we

Figure 5.4 Supply current of inverter as input is swept from GND to VDD.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 91

would have to rebuild all of the ones we thought we were finished with. Usually the reset flip-
flop or the full adder will be the worst of the bunch.
One sizing issue is the width of the rails, as they must power the cells of an arbitrarily long
row of abutted cells, and they take up M1 wiring space within the cell. The maximum length
of an abutted row should not exceed, perhaps, 300 to 500 times the rail width. If such rows are
connected to strong power and ground at both ends (not just one end), the resistance from one
end to the center of a 900-μ long row would be 450/3 squares of metal, or 150 squares
(approximately 15 Ω at 0.1 Ω/sq) from each end to the center. Two such resistances in parallel
(one in each direction) gives us 7.5 Ω from the center to the VDD supply.
This inverter cell is 4-μ wide, so a 900-μ run could contain some 225 inverters. The inverter
(as a typical cell) can conduct as much as a milliamp each, but the current would be distributed
across the width of the row (assuming all inverters are transitioning simultaneously), and we
might assume a peak current of 100 mA at the center. The maximum voltage drop at the center
of the 900-μ long row would then be on the order of 0.75 V. This is not so alarming as a
variation of the 5 V supply, but it can cause serious issues when it happens to the ground rail,
as the substrate connections will couple that transient voltage “bump” to the substrate, and
every other device on chip. Granted, it is doubtful that all devices along the rail will draw peak
current at exactly the same instant, but in an autoroute there’s no guarantee that this won’t
happen. Since the resistance of metal layers and the lengths of metal runs are both finite, noise
injection into the substrate cannot be eliminated, but it can be minimized. Be particularly
concerned when your transient voltage calculations exceed a few hundred millivolts (see
“Latchup” in Chapter 6).
Connecting power or ground at just one end of a long row increases the severity of the
problem by a factor of 4: a 900-μ long, 3-μ wide ground rail measures 30 Ω end to end, and
the voltage drop in the above worst-case example becomes some 3 V. You may wish to
“dress up” your generated autoroutes by hand to make sure they are properly powered.
The most limiting space factor within the standard cell is the spacing from well edge to the
PMOS devices inside the well, and the NMOS devices outside the well; in this case, 1.6 μ
each. As a result, doubling the width of both NMOS and PMOS devices will not double the
cell height, but only increase it (in this case) by maybe 40%. The question of how large
standard cell devices need to be has one answer: it depends on wiring length. The speed of
logic circuitry depends on drive current from a cell, loaded by wiring capacitance and the
capacitive loading of other cell inputs. A note of caution here: do not attempt to make the
devices within a cell too small, as the effective width of the device will always be a bit smaller
than the drawn width. This is because of certain processing issues that cause the field oxide to
encroach into the

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 92

transistor area, making the effective width a few tenths of a micron smaller than the GDSII
information would indicate. Rely on SPICE to reveal this to you with a few simulations before
you draw an entire library.

Fanout and device sizing


Fanout is the ratio of the number of standard inputs that are driven by a single standard output.
High fanout means cells are more capacitively loaded, and will propagate signals more slowly.
Low fanout designs are of course faster than high fanout designs, but are also less space
efficient. The input capacitance of the above drawn inverter is about 10 fF, which is
approximately equal to a 100-μ run of interconnection metal. If this cell is meant to drive a 1-
mm metal line, its fanout is effectively 10, before it gets connected to a single logic device
input.
If interconnect metal had no influence on performance, then the devices within the cells
could be made very small, because as device width is reduced, the drive capability and the
input capacitance are simultaneously lowered, resulting in a net wash in terms of speed.
Because wiring load is a factor, autorouted designs may require many iterations of design at
the RTL level, synthesis, autoroute, and logic simulation using the autoroute standard delay
format (SDF) output file that will reflect metal loading values. In high speed designs, it is
common to repeat the process over and over. During each iteration, emphasis is put on a
particular set of nodes at the RTL level, instructing the synthesizer to drive a particularly
sluggish node with more aggressive drivers, or at the auto layout level, instructing the place
and route engine to route a particular set of lines more efficiently. The new result, however,
will often show a new set of connections with slow timing, as the layout will be completely
different from the previous one. The expression “herding of cats” comes to mind.
Designing with a low fanout, making your standard cell devices very large, or running your
circuitry at low speed may help in these cases, but nothing beats a tiny library and hand
placement, where you can be fully aware of metal runs, as you draw them. Small devices in
your standard cell library will draw less power, and fit into a smaller space, but may suffer
from the consequence of autorouter indifference.
There’s a lesson here: autorouted (and especially synthesized and autorouted) circuitry
will not only be inefficient due to poor wiring choices by the software, but may require the
standard cell library to be oversized and quite power-consumptive, representing yet another
level of layout inefficiency that can go unnoticed.
OK, let’s go on to the “tough” cells to see if we can live with the 14-μ cell height.
Let’s start with a full adder, as in the layout shown in Figure C5.4 of the color section.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 93

Fitting Cells into a Constant Rail Height


This adder was 15.2-μ tall, so our standard cell set must be at least this tall. The full adder is a
complicated function that takes A, B, and C inputs, and produces a sum out (S) and a carry out
(CO). Notice that the C input (carry in) is connected to three N and P devices, where the A and
B inputs are attached to four each. The three inputs, A, B, and C are logically identical, but
minimizing the load on the C input will allow increased speed in arithmetic circuits. The C
input is positioned toward the opposite end of the cell from the CO output, so that in a row of
such adders, performing an addition on a wide data signal, the carry out of a stage to the right
of the adder can be connected through a minimum width metal wire to the carry input of this
cell. Fast carry propagation is a serious consideration in arithmetic circuits.
It should not be assumed that such considerations were made when a foundry-supplied
library was developed. You may wish to construct a few different full adders for hand-packed
circuits, perhaps one with the C input on the other side, by the CO output, so that carry signals
can propagate vertically down an array, as may be required in a hand-packed multiplier. In
fact, many foundry-supplied libraries may work, and be well characterized, but the typical
library will be quite inefficient; a library for this process would have a 30- to 40-μ cell height,
with inefficient use of space. The library we’re trying to develop here will be efficient and
especially useful in handcrafted, very dense designs.
Notice also that signal routing is being done within the cell with Poly, which has a sheet
resistivity of about 40 Ω/sq. If you run the calculations though, the delay imposed by the use
of short Poly runs, such as this, are trivial to cell performance. As long as the runs aren’t too
long, feel free to use Poly as wiring. Also, notice that active is contacted by single contacts. If
you rip apart other designs for inspection, you may see double contacts within standard cells,
but this is not required. Yield should not suffer due to the use of single contacts within your
cell designs. Do not, however, attempt to connect very large devices (W > 10 μ) with a single
contact, particularly at the MOSFET source, as the increased source resistance will lead to
reduced output current.
The internal Poly wiring is spaced within the cell edge by 0.4 μ, so that cells abutted GND
to GND or VDD to VDD will not violate the Poly space rule of 0.8 μ. Such abutments may
cause difficulty when “flipping” and vertically abutting different cells, as the substrate/well
diffusion connections may not line up in a fashion that satisfies the rules. This issue can be
dealt with when hand placing cells, but can cause problems if your autoroute program allows
cells to be abutted in this way.
Finally, a quick look at the adder probably raises the question: “How the hell did you
come up with that?” Well, I didn’t do it quickly. The full

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 94

adder is one of those handy but complicated functions that can be built from simple gates, but
the interconnected gate version propagates slowly and takes up a lot of space. This adder is the
result of looking at other standard cells (by other designers) getting ideas that allow the
figuring out of a set of series and parallel MOSFETs that does the job most efficiently. I’ve
included cells such as this so that you can use them as examples; don’t get the idea that it’s
expected for you to just knock out such a complex function, in such a condensed form,
quickly.

Autoroute Considerations
Before we go on, mention should be made of the positioning of the VIAs that constitute
connections to the cell. They are positioned exactly down the center of the cell, between the
GND or VDD rails, although this is not necessary. Further, observations of the rules show that
the closest spacing between VIAs is 2 μ exactly (the isolated VIA pitch is 2 μ). Also, the
distance from the center of the VIAs at the right and left ends of the cell are exactly 1 μ from
the cell edge. This can be helpful to the autoroute engine, as all wiring will be performed on a
2-μ pitch. If we had placed the VIAs without regard to the interconnection pitch that the
autoroute tool can efficiently default to, the autoroute tool will potentially produce messy and
irregular results.
Standard cells are called standard cells because they have standardized dimensions so that
they may abut side to side in any order, without rule violation or signal shorts. When you
design a standard cell set, pay close attention to what happens when any two cells are abutted
side to side. In the case of our VIA connections being placed on 2-μ centers, we also make
sure the first possible instance of a VIA is half that dimension, or 1 μ in from the edge, so that
the interface between two cells will lie exactly between two possible vertical wiring paths.
Also, we will make each cell width a multiple of the 2-μ wiring pitch so that any arrangement
of standard cells will have connections on our standard pitch. You can draw other cells, of any
arbitrary size or shape, but they won’t be standard cells if they don’t meet the library
dimensioning requirements.
We should probably try to see if the (typically) worst cell, the reset flip-flop, can fit into this
15.2-μ high scheme, but let’s first look at just one more aspect of how the cells will be used
in an autoroute situation. This consideration will not affect our two metal process, but it would
be important if a three metal option was available. I’ll assume the optional VIA2 will have
the same rules as VIA, and that the optional M3 will have M2-like rules, so the pitch of VIA2
will also be 2 μ.
The autorouter will place M1 horizontally, between the rails in the wiring channels, M2 will
be placed vertically, and M3 horizontally, like

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 95

M1, except M3 can be passed over the top of the cells and M1 obviously cannot. If we
calculate from the rules the closest a VIA cell can be to the rails, and imagine M3 lines above
the M1 lines that can also overlap the cells, then imagine how the M3 lines will lay above the
cell; ideally, one M3 line will be in alignment with the VIA cells within the standard cell. If
you’re interested in getting the most compact autoroutes, you may consider how the
autorouting tool will work best with your cell design, and the worst-case pitch of VIA and
VIA2 pitches. The autorouter does a better job when it’s routing is on a strict pitched grid.
Further, if you ever want to adjust or hand compact the autorouted results, it will be much
easier if the wiring is neat and regular.
In our case, with these imaginary rules, the VIA pitch is 2 μ, and the space between the
center of a VIA cell in the wiring channel must be no closer than 1.4 μ to the rails. The
optimum cell height becomes:

where N is an integer.
Fortuitously (I did not intend this), 15.2 μ would work perfectly. However, in this case, N =
9 (odd number), so our VIA cells for connection must be offset toward VDD or GND by a half
pitch, or 1 μ. If this doesn’t work for our reset flip-flop, our next ideal cell height will be
17.2 μ,and if so, our connection VIAs can be placed down the center of the cell. Let’s try to
get this ugly function into the 15.2-μ cell height, which is shown in Figure C5.5 of the color
section.
Luckily, the reset flip-flop (DFFR) can be made to fit with the same rules as the full adder.
We now have a good chance of completing our standard cell library using this cell height and
Well dimension.
Like the adder, this function is pretty complicated, but it’s actually easier to understand
than it would appear at first sight. The DFF (without reset) is probably the best place to gain
an understanding of how the cell works. The reset function is simply added to the basic DFF
layout, although not without considerable difficulty. If you carefully follow the DFF
schematic, you will see what’s happening.
Notice a few things about the DFFR. First of all, the clock input (leftmost terminal) is
directed to the switches within the cell, and inverted to control opposing switches. The clock
input is not fully buffered, causing the input load at the clock terminal to be similar to perhaps
three normal inverter inputs. This is a relatively heavy load for a standard cell. A clock input
buffer could be employed to lighten the load at the clock terminal, but it would also delay the
clock into the flip-flop, causing the setup time to go negative and the hold time to be extended.
Usually, such flip-flops are driven by strong drivers, so flip-flop design without clock input
buffering is acceptable.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 96

Further, the output is buffered; that is, the Q output could have been taken from signals
internal to the flip-flop, but these signals drive poorly. Output buffering allows the flip-flop to
drive other circuitry more aggressively. The QN output is from the flip-flop internal circuit,
but it comes from a rather strong drive point, and is quite resistant to disturbance due to line-
to-line capacitance.

The Standard Cell Library


Figures 5.5 through 5.14 show a simple set of standard cells. The set is small, ideal for both
autorouting small circuits and hand layout. It is organized into 19 cells that are the simple
functions from which hand layouts can be prepared, and a small set of functions (complex)
that are more usefully employed by logic synthesizers; the and-or-invert and or-and-invert
gates. These can be prepared as you wish, but are not required for either hand placement or
synthesis. The 19 cell set is easy to create and much simpler than the several hundred cell sets
you might get from a foundry. It is a sandbox standard cell set.
Layouts of the standard cells are shown in the color section in Figure C5.6.
The last three layouts of Figure C5.6 are required for the autorouter. They allow contact to
VDD or GND, and in the case of the last cell, allow a crossover for M2, so that this metal
layer can connect one wiring channel to another, across an otherwise packed row of standard
cells. All of the cells have such cross ports drawn within them, wherever a

Figure 5.5 Sandbox simplified standard cell symbol set.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 97

Figure 5.6 Inverter and NAND gate schematics.

Figure 5.7 NOR gate schematics.

Figure 5.8 Reset flip-flop schematic.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 98

Figure 5.9 D flip-flop and 5 ns delay schematics.

signal connection does not exist, based on the strict 2-μ wiring rules. All ports are drawn as
line ports to be the minimum M2 width (0.8 μ in this case). The autorouter will align M2 to
the center of each port, using minimum width M2.
The cells also have a rectangular port on the Icon/Outline layer, named Abut, that defines
the outer extremes of the cell so that the autoroute

Figure 5.10 Latch and half adder schematics.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 99

Figure 5.11 Tristate buffer and 2-in. MUX schematics.

Figure 5.12 Full adder and XOR gate schematics.

Figure 5.13 And-or-invert gates.


Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).
Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 100

Figure 5.14 Additional and-or-invert gates.

tool can place cells side by side in a row (Tanner convention). This port is drawn around the
entire cell to the furthest extent of the M1 rails. VDD and GND line ports are attached to the
ends of the rails.
Notice the DEL5ns cell, consisting of eight inverters in series. The first inverter is a short
gate so that input loading is minimal. Subsequent inverters are built with long gates so that the
signal may propagate slowly enough to achieve a 5 ns overall delay. Since the function is
several stages long, the rising/falling delay times are nearly identical, despite a nonideal P/N
width ratio. The last inverter is of the short-gate type again, so that external loads may be
driven more aggressively. The schematic is drawn as eight inverters, since this is both accurate
and convenient. All other cell schematics are drawn using transistors, and have the level
attribute (I use the name Verilog) attached to their symbols, instructing the autoroute and
simulation software to stop at that level and interpret the cell as a standard cell. The DEL5ns
cell must have this attribute attached, or the schematic will be interpreted as eight simple
inverters, and they will be used by the autoroute tool instead of the DEL5ns cell. The
difference in delay will be significant. The autorouter will not complain, as inverters are
perfectly acceptable for routing, and hopefully the error will be caught in logic simulation. Be
careful when drawing standard cells using the schematics of other cells, as the level attribute
must be used to ensure that the top level cell is used instead of the lower level cells.
I bring this up because it once happened to me. Because I failed to attach the stop level
attribute to a delay cell, it was autorouted as eight inverters, and the logic simulation did not
show a logic error. The condition was on the edge of proper timing though, and the fabricated
silicon was unreliable. As the simulator will demand a model, I suggest that the

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 101

simple cells that are very analog in nature, such as this delay, should be drawn as transistors
(not inverters), so that there in no confusion.
All of these cells are drawn as simply as possible, so that you may see more clearly how
such functions are built. They may propagate faster if the internal devices are sized a bit
differently, but the more critical factor for logic speed in a system is that of fanout.
The plot of Figure 5.15 was prepared using a 2.15-V threshold potential; that is, the
propagation was measured from the input signal (200 ps rise/fall times) as it crossed the 2.15-
V level, to the point in time where the output crosses the 2.15-V level. The difference between
rising and falling delay times was so close as to not require differentiation. Although the P/N
width ratio leads to an imbalance in rising/falling output drive, the effective threshold being
depressed by this same imbalance provides some advantage in producing similarity in
rising/falling propagation delays, provided the depressed threshold is used as a measuring
point.
Now we can begin to gain an understanding of how the cells may be characterized as to
propagation delay, input capacitance, and output drive currents. These parameters will be
required when the cells are abstracted into models for logic simulation.
Simple cells will have varying input thresholds and output drives, which can cause
difficulty in abstracting propagation delays for logic simulation purposes. In the case of our
inverter that has an input threshold of approx 2.15 V (at VDD = 5 V), the rising and falling
propagation delays are approximately equal when measuring signal timing at this threshold
potential. If measured at mid-supply, however, the rising output delay is significantly greater
than the falling output delay.
The NAND4 cell has 4 N devices in series, forcing the input threshold to a higher potential
of 2.1 to 2.4 V, depending on the input

Figure 5.15 Plot of INV1 propagation delay as a function of output loading.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 102

stimulated; the NOR4 cell, with four P devices in series shows a threshold of approximately
1.6 to 1.8 V, depending on which input is stimulated. This leads us to the question: What
conditions are to be used during SPICE simulation to determine propagation delays?

Standard Cell Propagation Delay


As a standard cell designer, you must define the cells according to their intrinsic delay from
inputs to outputs as a Verilog model file for direct use by the logic simulator. Also, you must
define the output drive resistance, both P pulling high and N pulling low as independent
values, and input capacitance values in a Liberty file (.lib) as information used by the
autoroute tool while it constructs an SDF file. The Liberty file format was developed by
Synopsys and is widely used in conveying timing information to a logic simulator, either an
assumed loading at the synthesis level, or a calculated loading from an actual layout.
The fact is that in a practical standard cell library, no simple model of delays, loads, and
driving resistances will precisely predict the cells’ performance over a wide range of input
signal rise/fall times, but an approximation can be derived that is satisfactory for logic
simulation purposes. Consider the SPICE plot of the INV1 function (Figure 5.16), loaded with
a capacitance that equals that of 10 inverters (fanout = 10).
The traditional method of determining delay time of a cell is to measure the difference in
time between the moment the input crosses the 50% point and the time the output crosses the
50% point. In this case, the

Figure 5.16 Transient response of INV1 cell, loaded by 10 INV1 inputs.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 103

falling output delay is about 550 ps, the rising output delay is about 850 ps. From this we
could estimate the equivalent N and P device on resistances to include in the Liberty file, and
calculation of gate areas can give a value for input capacitance, both ingredients of the Liberty
file. This, combined with the basic delay of the cell, measured with no load, to compose a
Verilog model, will be used for logic simulation.
Unfortunately, ours is not an idealized situation, because our standard cells do not respond
at the 50% point. In fact, since the threshold of this cell, the inverter, is more like 2.15 V (43%
of supply), the propagation delay of the inverter will be longer with falling input signals than
with rising ones. Further, this effect will only increase as the input rise and fall times increase.
To compensate for this problem, advanced tools will accept a more complex expression to
calculate the delay more accurately by taking input rise time (calculated from drive and
loading at the cell input) along with cell loading at its output.
The logic simulator deals with transitions between logic levels as events that drive cell
model inputs and produce output events after a calculated delay period. The delay period
depends on the cell model’s intrinsic delay. In the case of an applied SDF file from an
autoroute, additional delays are calculated due to metal resistance and capacitance and the
effect of fanout. Without an SDF file, ModelSim will not even take fanout into consideration.
Small circuits that are placed by hand will need to be analyzed using SPICE to inspect signal
propagation and rise and fall times. SPICE, however, cannot be used to do extended logic
verification; it is far too slow.
I would like to suggest as an alternative to the complicated timing model approach, which
your tools may not support, and in opposition to tradition, that you make all of your delay and
drive resistance calculations based not on the traditional 50% threshold, but on one that
actually represents the input thresholds of your standard cell library. There will be no single
input threshold voltage value across the entire library, but the vast majority of cell inputs will
be very similar, and the variations from that mean will be reasonable. Thoughtfully select a
threshold voltage from a few selected SPICE simulations (DC sweeps) and use that voltage for
timing measurements on all of the library’s cells. Do your delay measurements from inputs
and outputs as they cross that threshold. Then, calculate the N and P device resistances from
the time that the devices require to charge load capacitance to your library’s threshold. You
will find that, despite the P devices having an actual resistance that is considerably higher than
the N devices, when this new calculation is made the resulting resistance values you will use
in your liberty file will be quite similar, and accurate, when the correct cell threshold is
considered. Once this is done, your simulations will actually be more accurate than those using
the artificial 50% point in calculations.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 104

Now that’s a pretty complicated set of ideas, and may require rereading to fully take hold.
So that the drive resistance calculation can be made easily, follow these instructions:
1. Simulate the cell with no load and measure the propagation delay from input (as it crosses
the library’s threshold voltage) to output (as it crosses the library’s threshold voltage).
Use this as an intrinsic delay value in the Verilog model, and provide this information for
both rising input and falling input cases. Use a rather fast, but realistic, rise/fall time in the
stimulus waveform, perhaps 0.2 ns for 0.35-μ cells, maybe 0.4 ns for 0.6-μ cells.
2. Load the cell with a significant load, perhaps equal to 20 standard inverter loads (as a
capacitance to ground), and perform the simulations again.
3. Subtract the unloaded from the loaded propagation delay values and divide the results by
the added load capacitance to find drive transistor resistances. The P resistance is found
when the output is rising; the N resistance is found when the output is falling. These are not
actual drive resistances, but effective drive resistances when the cell library’s threshold
voltage and the mechanism for delay calculation are taken into account.
This technique does not result in exact delay modeling, but will be far more accurate than
that of determining output device resistances directly, and using the 50% points for
determining propagation delays. The approach I suggest here is perfectly adequate for sandbox
projects; it’s simple, to the point, and effective.
The SDF file is created from your input capacitance values (input gate area times gate
capacitance per square micron), in parallel with trace capacitances, and the output drive
resistances of your cells in series with trace resistances to produce new delay times that are
inserted into the netlist during simulation. The delay calculation is based on the Elmore delay
model. One may argue that the Elmore delay model is basically incorrect, as it models a delay
as a time constant; for example, a 1K drive resistance, loaded with a 1 pF capacitance will
result in a 1 ns delay from the Elmore calculation, and this delay will be inserted into the logic
path during simulation. However, a time constant is the time for a signal to reach 63% of its
final value, not the 50% point, and as we see above, certainly not the threshold of our cells for
rising signals. It is disappointing that the Elmore delay model operates in this way, but perhaps
understandable, as it should predict delays on the pessimistic side, which at least safe. When
we model our output resistances in the above is defined manner, the Elmore delay model is
very accurate, with the exception of metal resistances, which cannot be

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 105

extracted in a manner that is correct for our displaced-threshold technique. Fortunately, such
metal resistances are much smaller than the gate output resistances, and inaccuracies will
occur in very few places, typically only when very large output buffers are driving very long
metal lines, and, in particular, in finer geometries (<0.18 μ), where metal resistances are high.
If you are concerned about the accuracy of your models, build some simple circuits and
simulate them in ModelSim, then simulate the layout directly in SPICE, and compare the
results. SPICE will be accurate, ModelSim will be relying on your timing abstraction; adjust
drive resistances until you are satisfied that the models are a good representation.

Verilog Models
Verilog files that describe the logic functions of the standard cells can be written using
primitive functions in the Verilog language, such as and, or, nand, nor. An example of the
NAND2 and NAND3 functions in a Verilog file follows:
// Cell NAND2
`celldefine
`timescale 1ns/10ps
module nand2 (Y, A, B);
input A, B;
output Y;
// Function
nand (Y, A, B);
// Timing
specify
(A => Y) = (0.08, 0.09);
(B => Y) = (0.07, 0.08);
endspecify
endmodule
`endcelldefine
// Cell NAND3
`celldefine
`timescale 1ns/10ps
module nand3 (Y, A, B, C);
input A, B, C;
output Y;
// Function
nand (Y, A, B, C);
// Timing
specify
(A => Y) = (0.1, 0.1);
(B => Y) = (0.1, 0.1);

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 106

(C => Y) = (0.1, 0.1);


endspecify
endmodule
`endcelldefine

Propagation delays are specified as rising and falling for each input path to the output.
Flip-flops do not have Verilog primitives, as they require a register that must be declared for
each model. Also, the more complex functions, such as the full adder or half adder, may
require a primitive definition. A user-defined primitive (UDP) can be defined in a Verilog
format to be compiled with the cell timing information during simulation. Once a UDP is
defined, it may be used as though it was a standard primitive. In this way, the Verilog
language may be expanded. A UDP Verilog file that can be included along with the model file
looks like this:
// This is for rising edge a D -> QN output F/F
primitive dff_rc_dqn_udp (QN, C, D, notifier, gsr);
output QN;
input C, D, notifier, gsr;
reg QN;
initial QN = 0;
// ? : Any value
// * : Changed state
// X : Undefined
// - : Current value
// input pin order taken from input declaration not pin list.
// C D notifier gsr QN QN+
table(01) 1 ? 0 : ? : 0; // Rising edge clock with D=1, set QN=0
(01) 0 ? 0 : ? : 1; // Rising edge clock with D=0, set QN=1
(0X) ? ? 0 : ? : 1; // Clock 0->X, set QN=X
(1?) ? ? 0 : ? : -; // Falling edge, hold current QN
(X0) ? ? 0 : ? : 0; // initial condition
(X1) ? ? 0 : ? : 0; // initial condition
? (??)? 0 : ? : -; // Clock steady, changing D, hold QN
? ? * 0 : ? : X; // Timing violation happened
? ? ? 1 : ? : 1; // forced reset from test bench
? ? ? (10) : ? : -; // stay same if reset goes false
endtable
endprimitive

This UDP can be used to define a flip-flop that would be declared in the model file (along
with the other cell functions). Notice that the UDP shows the D input propagating to the QN
output (as actually happens in the cell), and the Q output is defined in the cell model as the
inverse of QN. The corresponding cell definition in the model file would look like this:

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 107

// Cell DFF
`celldefine
`timescale 1ns/10ps
module dff (Q, QN, C, D);
input C, D;
output Q;
inout QN;
reg notifier; // Changed when a timing violation occurs
// Function
// Using the user defined primitive for a D F/F, see SB_udp.v
not (Q, QN);
dff_rc_dqn_udp (QN, C, D, notifier);
// Timing
specify
(C => QN) = (0.3, 0.3);
(QN => Q) = (0.3, 0.3);
//Timing checks
$setup (D, posedge C, 0.15, notifier); //Setup timing check
$hold (posedge C, D, 0.05, notifier); //Hold timing check
endspecify
endmodule
`endcelldefine

The Verilog language requires some study for you to effectively put together simulation test
files, and many books are available on this subject.
An example of the Liberty file format for cell loading information looks like this:
library (EXAMPLE.LIB)
{
pulling_resistance_unit : “1kohm”;
capacitive_load_unit (1.0,ff);
cell(DFF) {
pin (D) {
direction : input;
capacitance : 7.0;
}
pin (C) {
direction : input;
capacitance : 16.0;
pin (Q) {
direction : output ;
timing() {
rise_resistance : 3.8;
fall_resistance : 3.1;
related_pin : “C” ;
}

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 108

}
pin (QN) {
direction : output ;
timing() {
rise_resistance : 4.8;
fall_resistance : 3.1;
related_pin : “C” ;
}
}
} /* dff written 3/10/04*/
cell(inv1) {
pin (A) {
direction : input;
capacitance : 7.0;
}
pin (Y) {
direction : output;
timing() {
rise_resistance : 3.0;
fall_resistance : 2.9;
}
}
} /* inv1 written 3/10/05*/
}

Most logic functions are direct; that is, after an initial delay period, an input transition
propagates to an output directly. The flip-flops, however, require definitions of setup and hold
times, both relating to the condition of the D input as the clock terminal rises. SPICE
simulation with the D input changing at various points in time relative to the clock will reveal
what time before or after the clock edge that D must be stable to propagate correctly to the Q
output. You will probably find this to be a very specific point in time. In the flip-flop drawn
earlier, this point in time may be on the order of 50 ps prior to the rising clock edge.
Nonetheless, the simulator can accept a setup time, the period when D must be stable prior to
the rising edge of the clock, and a hold time, which is the period of time that D must remain
stable after the rising edge of the clock. If the D input changes within this timing window, the
simulator will propagate an unknown value to the Q output, showing you that a timing
violation occurred. You do not want to design circuits that are “on the edge” of working,
and this is one mechanism (the establishment of data setup and hold) that allows you to
determine how robust your designs will be. Simulations can deliver a very good representation
of how your circuit will behave, but such simulations are never exact.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 109

In the above example of the timing threshold being, say, 50 ps prior to the clock, I would set
the setup time at perhaps 250 ps, and the hold at perhaps 150 ps, establishing a ± 200 ps
“window” around the actual critical timing point. Beware of making hold times too long,
however, as a “toggle” flip-flop, constructed by connecting the QN output back to the D,
input may cease to work. If the propagation delay from clock to QN is shorter than the hold
time you’ve entered, the flip-flop will go “unknown,” which is most likely a very false
result. If you want to protect against simulation inaccuracies, while, perhaps, impairing your
ability to build extremely fast circuits, consider extending the setup time (perhaps several
nanoseconds), but be careful about extending hold beyond the CLK > QN delay time.
Notice that the $setup and $hold statements in the Verilog flip-flop model define setup and
hold time and pass a violation to the variable term “notifier.” This is referenced in the flip-
flop UDP that then forces an X (unknown) to the flip-flop output.
The latch function (LAT) drawn earlier, is transparent-high, meaning that the Q output will
reflect the value on the D input when E is high, but hold the last value of D when E falls.
Setup and hold conditions can also apply here.
Building your own standard cell library may be a lot of work at first, but it will familiarize
you with the tools you will need to know to make custom designs. Hopefully, I have given you
a broad enough “taste” of the work for you to proceed with questions for which you can
find answers in your tool manuals, and perhaps a book on the Verilog language. The details of
cell design though, which I have attempted to put in as clear a form as possible, are rarely
found anywhere, at least in such a condensed and approachable form. I sincerely hope that you
benefit from considering these details, despite the somewhat nonlinear path I have been forced
to take in describing them.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
STANDARD CELL DESIGN Keith Barr 110

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
Source : ASIC Design in the Silicon Sandbox Keith Barr 111

Peripheral Circuits
When we design a standard cell library, we are working within a tightly constrained and
protected environment, with logic levels that have high degree of certainty. Nonetheless,
we must derive this certainty through analog simulation of the cells using SPICE and then
abstract the results into a form that allows us to see the cells as “logical” functions.
This abstraction is only for our convenience; ALL functions in an IC are fundamentally
analog in nature.
Throughout this book, I have tried to insert new concepts along the way, and while
such concepts apply to the issues at hand, they are only mentioned in passing and not
detailed to any great extent. The subject of IC design is filled with details; it is a
multidimensioned universe of concepts, and taking a linear approach would be lengthy
and boring. I do not want this to be a sequence of texts with no immediate relation to each
other, the conclusion to which would presumably be the implosion of disconnected ideas
into a complete understanding. The subject can be taken a piece at a time, with side issues
addressed along the way, in a more connected fashion. At some point, however,
previously touched ideas need to be more fully explained.
I suppose the subject of bonding pads, protection circuits, and peripheral busses could
be left till last, but because these circuits are so analog in nature, and require a deeper
understanding of semiconductor materials, it is a good place to pause before going on to
the issues of more traditional analog designs.
To gain a better understanding of the concepts I’ve tossed into previous chapters, and
to form a basis for subsequent subjects, let’s go over some of the material characteristics
in our sandbox, and perhaps new ways to understand them. I will place emphasis on
visualization as a substitute for mathematical equations, as an intuitive understanding

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 112

can be developed that is more effective in IC design than will be a strict numerical
analysis; you will be using numbers, and you can derive equations to use if that makes
you more comfortable, but seeing a problem that can be solved in your head is satisfying,
useful, relevant to the immediate issue at hand, and leads to a more thorough
understanding. If the result from your first mental calculation seems wrong, you can
always evaluate the situation from a different perspective, checking your first work and
developing a deeper understanding in the process. I find boring tasks, which aren’t so
energetic as to rob glucose from the brain, are perfect opportunities to mull over such
things, although I’m also known for passing freeway exits while thinking about IC
structures. We now live in an age where programs like SPICE can rigidly do
unimaginable number crunching, leaving us to do the less well defined but imaginable in
our heads. Let’s call this “sandbox thinking.”
A note on systems of units: We design ICs in microns, where a micron is a millionth of
a meter. This is necessary so that convenient numbers can be used to define extremely
small features. In the older CGS (centimeter-gram-second) system of units, 1 cm is equal
to 10,000 μ. Centimeters are handy for the computationally handicapped, as a cubic
centimeter of a substance is easily imagined; you can hold it in your hand, toss it into the
air and catch it again, feel its heft, and so on. Cubic meters are a slightly different story,
as only a material like Styrofoam can be handled without the assistance of serious
machinery. Finally, it is hardly possible that deviation from the MKS (meter-kilogram-
second) system should attract criticism, since in the IC industry, mils (thousandths of an
inch) continue to be the unit of measure at IC packaging companies.

Bulk and Sheet Resistivity


Previous chapters have referred to bulk resistivity and sheet resistivity, terms that may not
be immediately clear. In the CGS system, the bulk resistivity of a substance is the
resistance of 1 centimeter cube of material when measured across opposing faces using
perfectly conducting contacts that completely cover those faces. The bulk resistance unit
of measure is the ohm-cm.
The resistance of any simply shaped conductor can be easily calculated by knowing its
dimensions and the material’s resistivity. Imagine the flow of current between electrical
connections to a single cube of material (with known resistivity in ohm-cm), stretch it out
to the real object’s length in the direction of current flow (which proportionately
increases resistance), stretch it out to the real object’s width (which proportionately
reduces resistance), stretch it out to the object’s height (which

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 113

also proportionately reduces resistance), and you’re done. Obviously, you do the opposite
when shrinking instead of stretching, but I’m sure you get the picture. Alternatively, you
could stack up 1 cm cubes of material in series and parallel until you get to the desired shape,
and sum up the resistances.
Material electrical bulk resistivity, in ohm-cm:

Silver 1.6E-6
Copper 1.7E-6
Gold 2.4E-6
Aluminum 2.7E-6
Nickel 2.8E-6
Tungsten 5.7E-6
Iron 9.7E-6
Titanium 42E-6
Silicon for MOSFET construction 20
Substrate bulk under epi Si layer .02

You may wish to scale the above numbers to a dimension that better applies to IC
structures, in which case you can convert the above table to ohm-microns by multiplying the
ohm-cm values by 10,000.
When layers of materials are flat and thin, the concept of sheet resistivity can be used. Sheet
resistivity can be found by simply dividing the material’s bulk resistivity by the sheet’s
thickness. A 1 oz PCB copper layer is defined as 1 oz of copper covering 1 ft2 of material,
which equates to 0.03 gm/cm2 of material. Copper has a density of 9, so the thickness of the
layer is about 0.0033 cm; its sheet resistivity is then approximately 500 mΩ/sq. I got this value
by realizing that the reciprocal of 0.0033 is about 300, and 300 times 1.7 is damn near 500.
You can calculate the resistance of a PCB trace by measuring its length and dividing by its
width, which gives you the number of imaginary width-sized squares of material that make up
the trace. This number of squares times the sheet resistivity gives the trace resistance.
Another way of looking at the sheet resistivity concept is to imagine a thin square of
resistive material, and measure its resistance with perfect conductors along two opposing
edges. Now place four of these squares together to form a twice-sized square. We now have a
network of two squares in series, and an identical network in parallel with the first: The
resistance is identical to that of the single, original square. The concept, although possibly
difficult to grasp initially, is quite valid. The beauty of sheet resistance is that it is a
dimensionless quantity, and can therefore be used to evaluate geometries as aspect ratios,
without regard to size.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 114

Typical sheet resistivity in ohms per square:

1/2 oz PCB traces .001


IC metallization .03 to 0.1
CMOS diffusion regions 100 to 200
Well feature 2K
Polysilicon gate 40
Metal silicide on diffusion or Poly 2 to 10
Polysilicon at various dopings 30 to 3K

You may also want to reduce the rules and characteristics of a process down to area
resistivity when making large-valued resistors out of polysilicon. The designed structure may
be a serpentine strip of poly that covers a large area, but, how large? You look up the
minimum poly width, the minimum poly spacing, and the sheet resistivity. Imagine a square of
minimum sized poly, stretch that square out one side by the poly spacing dimension, and
calculate the resulting rectangle area in square microns, then divide the sheet resistivity by that
area to obtain a characteristic resistance per square micron value. Once this is calculated, you
can quickly determine the resulting area that a minimum width poly resistor value will occupy.

Insulator Dielectric Constant


Capacitance calculations are also very simple. The permittivity of free space (or air for all
practical purposes) is 0.088 pF/cm, meaning that two 1 cm2 conducting plates spaced by 1 cm
will have a capacitance between them (ignoring fringe capacitance) of 0.088 pF or 88 fF. The
intervention of matter between the plates will always increase the capacitance, as all known
materials have dielectric constants greater than 1 (vacuum = 1.0). Calculations on various
capacitor geometries can be made by assuming the free space value; then scaling the result by
the material’s dielectric constant.
Dielectric constants of various materials:

Free space (vacuum) 1.0


Air 1.0006
Teflon 2.1
Mylar 3.1
Epoxy (cured) 3.6
Silicon dioxide 4.5
Silicon nitride 7.0
Glass (soda-lime) 7.8
Silicon 11.8
Water 78

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 115

We can imagine capacitors much as we do bulk resistors, by the stretching method.


Calculating fringe capacitance at the edges of a capacitor plate is a bit more difficult, but, in
general, if a capacitor plate is spaced from a much larger conducting surface, an extra amount
equal to the plate spacing can be added around the plate’s periphery to account for the fringe
value. Obviously, fringing will have a lesser effect on closely spaced plates.
Capacitance in semiconductor processes can be described in terms of capacitance per area,
since the spacing is fixed and the dielectrics are known. Conveniently, we can evaluate a poly-
poly cap as having a certain amount of capacitance per square micron, typically from 0.6 fF to,
perhaps, 1.2 fF, with a relatively low fringing value. All metal layers will also be specified in
both area (fF/μ2) and periphery (fF/μ). These layers will all have different spacings to
substrate, and this is reflected in their area and periphery capacitance values; lower metal
layers will have higher area capacitance but lower fringe, while upper metal layers will have
lower area values and proportionately higher fringe values.
As for the inductance of wires, the permeability of free space is 4 × π × 1E-7 H/m or 12.6
nH/cm. Infinitely fine wires would display this inductance, but real wires that actually have
diameter (like bonding wires) will show a slightly lower value. This can be used to estimate
bond wire inductance.
Also, bonding wires have resistance, which we can easily calculate. A 1 mil (yes, one
thousandths of an inch) diameter gold bonding wire has a resistivity of about a 1/2 Ω/cm of
length.
The resistivity of the silicon substrate within which MOSFET devices are fabricated is
normally on the order of 20 ohm-cm. If we use our scaling technique, we find that a 1 micron
cube of such material has a resistance of 200,000 Ω. It is striking that although we may be
conducting several hundred microamps through a 1-μ wide MOSFET channel established on
the surface of the silicon, effectively a resistance of tens of thousands of ohms, the bulk
resistivity of the underlying silicon is much higher than that of the induced channel. It is also
important to note that on such a small scale, very large voltage drops can occur due to rather
small currents. This is a serious problem in the peripheral circuits of an IC, where transient
currents from off-chip can be measured in amperes, potentially causing severe internal circuit
damage.

Semiconductors
OK, time to talk about how semiconductors work … uh, behave. Earlier on, I said that we
don’t have to know how these things actually work, just how they behave, and I can’t go
back on my promise. Let’s try to imagine the silicon at a microscopic level and gain an
understanding of what’s going on. In the process, I’ll resist the urge to draw cartoons of

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 116

Mr. Happy Electron dancing through a poorly drawn crystal…. We’ll try to imagine what’s
going on; the accompanying illustrations will be formed in your mind.
All ICs are made from a carefully grown and oriented layer of single crystal silicon. The
atoms in the silicon crystal are regularly arranged so that each silicon atom is “attached” to
four neighboring silicon atoms through the sharing of electrons. Each silicon atom has four
outer electrons that it shares with its neighbors. The perfect crystal lattice arrangement is only
disturbed by surfaces where neighbors are absent.
The conductivity of pure silicon is extremely low. Unlike metals which appear to be atoms
surrounded by a sea of mobile, interchangeable electrons, where you push an electron in on
one side, and another pops out on the other, a pure silicon crystal does not respond to an
electrical stress by passing a current; it is a fairly good insulator.
When dopant atoms are introduced into the silicon crystal, by adding them during
manufacture or bombarding them into the surface later, the electrical characteristics of the
silicon change. The silicon crystal is strong enough and regular enough to continue its basic
3D structure, despite the occasion of an included atom that doesn’t have the four bonding
electrons that silicon does. The common dopant elements are boron, phosphorus, and arsenic.
Boron has only three outer electrons, so when it takes the place of a silicon atom, an electron
is missing at that crystal location; a “hole” is formed. This is called P doping, as in positive,
(because of missing an electron). Arsenic and phosphorous, both, have five outer electrons, so
either will fit into the silicon lattice with an electron to spare; this is called N doping, as in
negative (because of having an additional electron). Both of these doped versions of silicon
conduct electricity, and the conductivity increases directly with the dopant concentration. Too
high a doping level will cause the crystal structure to no longer hold its precise shape and
regular organization, so there is a limit to how heavily silicon can be doped. Most doping
levels are extremely low; on the order of one dopant atom for every 50 million silicon atoms
(typical for a substrate material).
The character of N or P doped silicon is the result of the conflicting requirements of the
component atoms in terms of charge neutrality. The silicon lattice determines the crystal
structure, and for silicon four electrons per atom are required for charge neutrality, but for P
dopants only three are required, and for N dopants five are required. A P dopant atom may
accept an electron from a neighboring silicon atom, since that will satisfy the crystal lattice
requirements, but then easily give it up to satisfy its own charge balance. When the P dopant
accepts an electron from a silicon neighbor, the neighbor then has a hole at its location. An N
dopant atom will be charge-neutral, holding onto its five electrons, but will give one electron
up to the lattice to satisfy the charge

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 117

balance of the surrounding crystal structure; the extra electron will be then accommodated
elsewhere by the silicon lattice. The electrical interplay between the host (silicon) and guest
(dopant) is one of continuous give and take in an attempt to attain charge neutrality throughout
the structure. In fact, in a P-type material, the holes are distributed throughout, not necessarily
localized to the dopant atoms. In N material, the excess electrons are not specific to the dopant
atoms, but also distributed throughout the crystal structure.
Holes or electrons within the silicon matrix have a localized potential field, shielded by the
surrounding silicon lattice. Currents are conducted through a single type of doped
semiconductor by the movement of excess electrons in N-type material, and holes, or the
absence of electrons, in P type material. These are called majority carriers in their respective
doping types. When a field is applied across a volume of doped semiconductor, the charged
regions move in response.

Diode Junctions
When an N doped region is abutted to a P doped region, a diode junction is created. When the
N and P regions are at the same potential, electrons are attracted locally out of the N region to
the P region, and holes are attracted locally from the P region to the N region until a charge
equilibrium is established. This causes a very thin, carrier-free zone between the two regions,
called a depletion region. The thickness of the zero-bias depletion region depends on the N
and P doping levels; high doping will cause a thinner depletion region.
A reverse bias is applied to the junction by making the P region more negative than the N
region, which electrostatically repels opposing charge carriers deeper into the silicon,
widening the depletion region. The depletion region will widen by growing into the two doped
regions, depending on dopant concentration, to establish a new equilibrium and a new
depletion region thickness. The total thickness of the depletion region grows as the square of
the applied reverse bias voltage; that is, as the reverse potential is quadrupled, the depletion
region thickness is only doubled.
The depletion region is a good insulator, indicated by the reverse biased leakage of a
common diode.

Zener Diodes
At some point, under a large reverse bias, the thin depletion region will be so severely
electrically stressed that breakdown will begin to occur. This is the avalanche or Zener region,
where further reverse bias causes increased current flow. The breakdown voltage of a Zener
diode is

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 118

controlled by the doping density of the component N and P materials, which determines the
width of the depletion region. High doping density translates to thin depletion regions and
lowered breakdown potential. The source and drain junctions in a CMOS process (diffusion
junctions) and the substrate and Well dopings are such that the junctions break down at a
potential that is on the order of the breakdown potential of the thin gate oxide. This Zener
characteristic of the process’s junctions is useful in designing protection circuits.
The breakdown of the depleted region in a Zener diode is a complicated process, where
thermally excited electrons are accelerated through the high electric field within the depleted
region, gaining sufficient kinetic energy to knock electrons out of their normal resting
positions within the silicon lattice, causing additional electrons and holes to be
generated,which constitute additional conducted current. This avalanche process, generating
both majority and minority carrier currents will become an important mechanism later, for
clamping transients in protection circuits.
When forward biased, that is, the P region being more positive than the N region, the
junction begins to conduct. In this mode of operation, the depletion region collapses
completely, and carriers from each doped region cross the junction to the opposing side. This
mode of current conduction is different from the current conduction mechanism in a single
type of doped material, as the N region is injecting electrons into the P region, and the P
region is injecting holes into the N region. Electrons in P material and holes in N material are
considered minority carriers, and as such do not fit in the normal charge balancing act
between silicon and dopant. Minority carriers will “drift” (also said to diffuse) through the
silicon lattice until they encounter recombination sites. In a typical diode, most recombination
occurs at the device connections where the silicon crystal structure is disturbed by metallic
contacts.

Bipolar Transistors
The previous section explains the mechanism of the bipolar transistor. As an example, let’s
consider an NPN transistor: The N-type emitter injects electrons into the P-type base region
when the base-emitter junction is forward biased. The base region, however, is P-type
material, and electrons are minority carriers in P silicon. A few of these electrons will
recombine in the base and constitute a base current, but most of the injected electrons will
encounter the N-type collector region before they can recombine in the base. In the N-type
collector region, electrons are majority carriers, and as such can be easily conducted away,
directly through the N doped silicon lattice. High gain bipolar devices have very thin base
regions so that the minority carriers within the base region

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 119

spend little time there before encountering the collector. Further, certain foreign elements,
such as gold and iron, are carefully excluded from the base region, as these elements can form
recombination sites that increase base current, decreasing transistor current gain.
Bipolar transistors are considered minority carrier devices, and suffer storage time
problems. This is the result of minority carriers left in the base region after the device has been
turned off; the injected charge, still in the base region, will continue to be collected until gone,
causing a delay in device turnoff. The same situation occurs in diodes, and is seen as reverse
recovery time. Fast recovery diodes are made by diffusing a thin junction into an epitaxial
layer. This moderately doped epi layer is grown onto a heavily doped substrate and may be
only a few microns thick; the abrupt transition from the epi layer doping (20 Ω-cm) to the
heavily doped substrate (0.02 Ω-cm) creates numerous recombination sites, and the number of
minority carriers that can be trapped in the epi layer is limited by its thinness. Epi layers on
heavily doped substrates are also used in CMOS processes, and we will see later how this
process enhancement can be used to advantage, particularly when designing sensitive analog
circuits.

The MOSFET
The MOSFET operates by electrostatically inducing a change in the silicon surface’s type
polarity, under the electrostatic influence of the gate. In an NMOS device, the N-type source
and drain terminals are implanted into a P-type substrate. The polysilicon gate material spans
the gap between the source and drain diffusions, immediately above the P substrate, insulated
by TOX. A small positive potential on the gate repels P-type carriers within the substrate
surface, allowing only very small currents to flow between source and gate through the
depleted silicon. A large positive potential on the gate causes inversion of the P substrate
surface by attracting electrons to the silicon surface, making the silicon appear N type in
nature. This allows a bridge between source and drain, allowing large currents to flow.
Although the NMOS device is fabricated in P-type material, and conducts by the flow of
electrons, the electron flow is essentially through N-type material (induced by the gate
potential), so MOSFETs are considered majority carrier devices. As such, they do not suffer
the storage time problems of bipolar transistors, allowing them to be extremely fast.
Another useful characteristic of the diode junction is that reverse voltage causes the
depletion region to widen, and the capacitance of the depletion region will therefore change.
Using this characteristic, voltage-variable capacitors can be built. In fact, the SPICE transistor
models that you will use to analyze your circuits take this into account automatically.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 120

The SPICE model will have a characteristic capacitance per unit area for each junction, both
N/Psubstrate and P/Nwell, along with the capacitance value of the sidewall of the junction
(accounting for junction depth and perhaps a different junction “abruptness”), and
additional parameters that allow the modification of these capacitances depending on applied
voltage. The capacitance of a typical N junction in a P substrate will vary by a factor of about
2:1 when the bias voltage is swept between 0 and 5 V.

Electrostatic Discharge
When the IC industry was new, it was found that devices would fail on the production line and
in the field, due to electrostatic discharge. An entire industry quickly emerged with products to
ground workers while they assembled products, conductive plastic bags to protect ICs and
assemblies, antistatic sprays for carpets, and so forth. The problem, however, can be more
effectively solved at the design level.
It’s not absolutely necessary for protection devices to be built into your circuits; I once
had a 1.2 μ chip design in a product for years before production people came to engineering
asking for an improvement that would help a low-level fallout problem on the protection line;
we discovered the production design was actually an engineering experiment that had no
protection devices at all. There may be specific applications where protection devices interfere
with your circuit, maybe where extremely low input capacitance or leakage is required, but in
general, protection devices are required.
For testing purposes, two basic models have been developed; the human body model
consists of a 100 pF capacitor charged to several kilo-volts and discharged into the IC pin
through a 1.5K resistor, and the machine model consists of a 200 pF capacitor charged to
several hundred volts and discharged into an IC pin through a 500 nH inductance. The
charging voltage in each case determines what class of protection is desired.
The peak current that an IC pin must be able to conduct to VDD or GND can easily be on
the order of several amperes, but only for brief periods, usually well under a microsecond in
duration. Infrequent current pulses do not significantly contribute to metal migration, but can
lead to outright melting of metal if the paths are not properly designed.
The heat capacity of metal traces and the thermal conductivity of insulation layers indicate a
thermal time constant of a metal trace separated from the substrate by a few insulation layers
to be several microseconds; therefore, for submicrosecond current pulses, thermal conductivity
through insulating layers to the substrate will be of little

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 121

help in keeping metal runs cool. (The thermal conductivity of silicon dioxide layers is about
0.003 cal/(s-cm-°C), or about 0.014 W/(cm-°C).) As an exercise, try to follow this set of
quick calculations.
The density of aluminum is about 2.7, so the mass of a 10 μ2 of 0.5-μ thick aluminum trace
is about 135 pg. The heat capacity of aluminum is 0.224, so the heat capacity of this 10 μ2 is
about 30 pcal/°C, which, in electrical terms, is about 126 pW/°C (a calorie equals about
4.18 W-s). The resistance of this layer would be 0.054 Ω. Assuming a current pulse duration
of 200 nS through the metal square, the power from a 1 A pulse would be 54 mW, and over
the pulse duration, a total energy of 10.8 nW-s would be absorbed by the thermal mass of the
metal, raising its temperature by approximately 85 °C. A 2 A pulse, however, means a peak
power of 216 mW, an energy of 43.2 nW-s, and a temperature rise of 340°C. Aluminum
melts at 660°C, but the heat of fusion for aluminum is about 92 c/gm, allowing significantly
more energy input to actually melt the trace. A 4 A pulse calculates to a 1360°C temperature
rise along a 10-μ wide trace, with a voltage drop of 21.6 mV/μ of trace length. The trace
temperature would increase to 660°C, pause for a bit as the metal melts, then go on to settle
at a 910°C final temperature rise (as liquid aluminum).
Few IC designers make such calculations, but they can, and if they play long enough in the
sandbox, they will. You can see by this example just how straightforward such calculations
can be. Granted, they are rough first approximation estimates, but they provide useful
information; without the confidence to make such calculations, the whole mess becomes some
mysterious black art. It is the unknowns that frighten designers away from IC design, so it is
imperative that you develop techniques that will dispel the unknowns.

Bonding Pads and the Seal Ring


The design rules that cover bonding pads, protection circuits and the seal ring will be supplied
by the foundry, and in many cases leave you wanting to know why the rules are as they are,
but on this subject, you will likely find no one to explain them.
Rules will exist that restrict the size of bond pad openings in the top silicon nitride
(overglass) layer, as though bonding pads were the only reason for the mask layer. In fact, you
may wish to place probe pads within your design so that internal circuit potentials can be
monitored in the development stage, and for such purposes, even a 10 μ2 opening would
suffice to accommodate a manually placed probe pin under microscopic observation. I
certainly don’t suggest badgering your fab for such information, but getting them to sign off
on small overglass openings will help in debugging analog layouts. Having small probe pads
available will allow their generous use.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 122

Rules will govern the placement of pads in relation to nearby circuitry, and the answer to
the why question is invariably that it’s up to the packaging house, and how much damage
their leadbonding will cause to nearby circuits. The packaging house will respond with “give
us as much space as you can,” which is of little help. It’s particularly disturbing when you
abide by really wide rules only to find on a visit to the packaging house that they are routinely
doing extremely fine pitched work with no problems at all. I’ve found that, provided the
foundry will sign off on it (why should they care?), you can get a packaging house to do just
about anything that’s reasonable. When you talk with the packaging people though, convert
your design details into mils, because these guys only think in inches. What a world; it’s so
high tech!
When your chip is packaged, it will most likely be ball-bonded to the package leadframe. In
this process, a gold wire is passed through a tiny capillary tube that will be automatically
moved from attaching a ball at the end of the wire onto your IC’s pad, then dragging the wire
across to the leadframe where it is smooshed into place, while simultaneously breaking off the
wire within the capillary. The ball is created by passing the wire as it emerges from the
capillary through a flame that melts the wire end into a ball shape (old ones use gas, new ones
use an electric discharge).
The attachment is done at elevated temperature and through the assistance of ultrasonic
vibration of the capillary tip, to actually alloy the materials that are to be connected through
localized heating. The pads for ball-bonding are therefore square, and the squished ball that
attaches to the pad will be perhaps three times the diameter of the wire used. 1 mil wire is
common, which leaves a ball that’s about 75 μ across, and an attached area at the pad of
maybe 50 to 60 μ in diameter. A pad opening of perhaps 80 μ would do nicely, with
surrounding metal (under the overglass) of maybe 5 μ all around. In a pinch, you can go with
finer wire and smaller pad openings, on the order of 60 μ.
If you need really close pitched pads, they can be narrower but longer, to accommodate a
wedge bonding technique. In wedge bonding, the wire is smooshed onto the pads and the
leadframe similarly with a precise, narrow tool. After the second bond, the wire is tugged to
break off and a new bond cycle can begin. Bonding is always from the most critical spot (your
bonding pad) to the least critical (the leadframe). Figure 6.1 shows a comparison of these two
approaches.
Note: It was once common practice to place a Well feature under each bonding pad,
presumably so that if the bonder destroyed the insulation layers beneath the pad with excessive
pressure, the Well would provide some degree of isolation from the substrate. Two words:
Don’t bother. There is simply no end to the features designers will put into ICs out of

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 123

Figure 6.1 Bonding pad possibilities, for ball bonding or close-pitched wedge bonds.

fear or ignorance, most likely the former due to the latter. This may be understandable, as tests
are expensive, but believe me, the well-underpad idea is not necessary. For me, having my
pads capacitively connected to floating, leaky substrate diodes that emit minority carriers into
the substrate every time the pad potential goes to ground…. I’ll pass. Can you see this?
The seal ring is one of those details that many foundries would like to actually place
themselves, as it does abut to the process control monitor (PCM) structures that they put
between die, but you may also wish to use this stack of surrounding metal layers as a ground
bus for the entire chip. Further, the seal ring is a perfect place to make a good ground
connection to substrate. For analog designs, I suggest a liberal seal ring, directly connected to
every ground pad, and as much contacted and P implanted active area as possible under it to
make substrate connection.
We’ve seen earlier on that the resistivity of bulk silicon is actually quite high when
viewed on a micron scale. Also, we know that we may use epi wafers, which are heavily
doped silicon substrates with a thin, epitaxially grown surface layer of the correct doping for
CMOS construction. Let’s imagine we’re attempting to connect our ground pads through
the seal ring to the underlying substrate using P diffusion through the epi layer. If the die is 4
mm on a side, the seal ring is 20-μ wide and the epi is 20 Ω-cm and 5-μ thick; not accounting
for metal resistivity, the resistance from ground pad to substrate is about 3 Ω. If the seal ring
was of a minimum dimension, say, 5-μ wide, our best contact to substrate through the seal ring
would be some 12 Ω. That may seem like a good connection when you’re considering 1 mA
currents, but when it comes to several ampere transient currents from protection devices, the
substrate is not a reliable “ground.” When epi is not used, the substrate

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 124

is extremely flexible electrically; a 4 mm2,


0.25-mm thick back lapped die would measure 800
Ω from one edge to the opposing edge.
The die is attached to the leadframe pad (or paddle) with an adhesive, usually an epoxy, but
sometimes with a metal brazing material, especially in the case of ceramic packages. Epoxies
that contain silver particles can cause the backside of the die to have better conduction to the
pad, but without a special metal deposition on the die’s back side (at extra expense), the
metal particles will make a poor connection. Sometimes, designers will specify a down-bond
from a connection pad on the IC to the leadframe paddle, but this is troublesome for the
packaging house and should only be considered in cases that are thought through and deemed
to be absolutely necessary. Connect your IC to package leads, and make as good a connection
as possible to the silicon substrate through top-side diffusions.
Basically, we need the substrate to be a quiet environment in which to build sensitive
circuits, but we cannot depend on it as a “sink” for large currents.

Protection Devices
Early ICs would use diodes as protection devices, which are easy to imagine.
At first glance, the pad connects to an N diffusion in the substrate, which will turn on if the
pad is brought below GND. Also, the pad is connected to P diffusion in an N well that is tied
to VDD, which will clamp positive pad potentials to the VDD rail. It appears that we have two
diodes clamping the pad potential to the rails. The schematics were even drawn with this idea
in mind. In fact, the structure is two diodes, but also two bipolar transistors, as shown in
Figure 6.2.
Recalling the description of semiconductor conduction mechanisms, the N diffusion in the
substrate will inject electrons as minority carriers into the substrate when its junction is biased
on. These minority carriers

Figure 6.2 Protection devices as parasitic elements in the CMOS structure.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 125

will “drift” through the silicon lattice until they find either a recombination site or an N-
type junction that can carry the current away as a majority current. The same principle applies
to the P diffusion in the N well. The P diffusion will inject holes into the well (as minority
carriers) that will either recombine within the well or find their way to the substrate where
they then become majority carriers, conducting current directly to the substrate contact at the
extreme left of the drawing.
This pad protection technique actually worked, but some designs worked better than others,
and the design technique took on a magic quality.
Two things need to be understood to make this design work well. First, a deeper
understanding of majority and minority carrier conduction is required. Majority carriers (holes
in P material and electrons in N material) conduct as current carriers that are a part of the
silicon lattice; the carriers are influenced directly by electric fields, as in any resistive
conductor. A potential difference between two points on a semiconductor of a given doping
will cause the movement of majority carriers, resulting in a current that is proportional to
applied voltage—a resistance. Minority carriers are not associated with the lattice in this
manner; they will drift through the semiconductor at random, only slightly affected by any
applied field, until they find a recombination site (hole-electron recombination) or encounter a
junction where steep potential gradients can accelerate them into an oppositely polarized
diffusion where they are majority carriers.
Second, the resistance of the silicon through which the majority carriers flow will suffer a
voltage drop that is proportional to the current and the resistance through which it flows.
Minority carriers only cause voltage drops across the silicon through which they drift to the
extent that they recombine along the way. If magic and IC design are really related, then
perhaps we should call minority carriers “ghost” currents.
From the above, we can imagine how the positioning of the diffusion and well features can
significantly affect the design. For example, it may be a good idea to surround the N well with
solid substrate connections to GND, as close to the well as possible. Also, the N diode could
be surrounded by an N well structure that’s connected to supply.
In any case, since positive pulses to a pad will rely on the P/N well diode (transistor) to
clamp transients, one must ask how the VDD supply rail is kept from exceeding a damaging
potential; somehow, the VDD supply must be clamped to stay within reasonable voltage
values. When a transient ESD pulse is delivered to an IC pin, you cannot be guaranteed that
the part is powered from a nice and stable power supply.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 126

Latchup
The next issue that applies to simple protection structures, but also every circuit in the IC, is
that of latchup. Let’s look at the latchup mechanism and see how it might play into our
design of simple protection devices.
Everywhere within a CMOS design we have the substrate tied to GND through P diffusion,
and we also have N diffusions tied to GND for the sources of NMOS devices. The well has
identical features, connected to VDD and the sources of PMOS devices. The diffusion
junctions, N in substrate and P in N well, are also the emitters of parasitic bipolar devices as
shown in Figure 6.3. Further, throughout our standard cell library and in virtually all of our
other circuits we will have these arrangements.
The structure is that of a silicon-controlled rectifier. If the substrate currents ever become
great enough to cause a voltage drop across RSUB1 that can turn the NPN device on, the NPN
will inject minority carriers into the substrate that can result in currents through the N well. If
the current through the N well is ever sufficient to cause a voltage drop across RWELL1 to
turn on the PNP device, its collector current could be sufficient to sustain the original voltage
drop across RSUB1, and a runaway condition of ever increasing current flow will ensue. The
maximum currents flowing will be limited by RSUB2 and RWELL2. In fact, if RSUB2 and
RWELL2 are of a high enough value in relation to RSUB1 and RWELL1, latchup cannot
occur at the operating supply voltage.
Further, if the minority carrier current generated by the NPN can be shunted off to some
destination other than the well, or if the PNP collector currents can be directed to the substrate
through a path that does not produce a significant voltage across RSUB1, then latchup will not
occur. The design of CMOS circuits without an understanding of the latchup mechanism and a
deep respect for the consequences will inevitably lead to circuits that will latch up.
The use of epi substrates very much improves a design’s resistance to latchup conditions,
since most of the PNP collector current will be

Figure 6.3 Parasitic bipolar devices coupled to form an SCR.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 127

through the bottom of the well to the substrate, not across the silicon to the NPN emitter
region. In fact, the thickness of the epi layer is the order of the N well depth; the heavily doped
epi/bulk substrate interface is immediately below the bottom of the N well. Also, most of the
minority carriers from the NPN will be recombined at the epi/bulk interface, and few will find
their way to the well itself, provided sufficient spacing is allowed. The use of epi can
significantly improve resistance to latchup in CMOS, but as you can see, the adequate spacing
of these features also plays an important role.

Lateral Bipolar Devices


More recent protection structures make use of lateral bipolar transistors and the snapback
effect of all MOSFETs, which we should consider separately.
Lateral bipolar transistors are fabricated in CMOS by the use of the polysilicon gate to
allow the closest spacing between emitter and collector regions. The spacing of isolated
diffusions is limited by the mechanism through which FOX is grown, which in effect defines
the active areas; the patterning of poly gate material, which also masks the implanted
diffusions, allows much closer diffusion spacing. Therefore, the lateral bipolar devices may
look like MOSFETs, but are, in fact, bipolar transistors.
The schematic of this structure is that of NMOS and PMOS devices, but, in fact, since the
gates are tied to source in each case, they will hardly conduct as MOSFETs. Instead, the actual
schematic of an NPN and a PNP device, although not appropriate for LVS checking (extract
will see the layout as MOSFETs), is more appropriate from an analysis point of view. The
NPN will conduct during negative pad transients, and the PNP will conduct during positive
pad transients.
Referring to the cross-sectional view of Figure 6.4, the emitter region is surrounded by the
collector regions, which gives the minority carriers, generated by the emitter, the best chance
of being collected. Otherwise, the minority carriers will combine in the substrate, causing a
substrate (or well) current; we want the currents to flow to the power and ground terminals,
which have good conductivity, not through the substrate, which has relatively high resistance
to the GND connection.
When designing this structure in a process that includes a silicide option (which drastically
lowers diffusion resistance), you may find helpful advice offered by the foundry rules. They
will suggest blocking the silicide in the “drain” region of the MOSFETs, which is more
accurately described as increasing the resistance in series with the lateral bipolar emitters. This
is done so that “hot spots” do not develop that could damage the diffusion junctions during
peak stress. Recalling bipolar

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 128

Figure 6.4 NMOS and PMOS structures that can function as lateral bipolar devices.

transistor behavior, we know that the “on” voltage of a bipolar device is reduced by about 2
mV/°C. If one spot gets hotter than another, that spot will begin to conduct more
aggressively, and with increased currents, get hotter still; this thermal runaway condition can
be relaxed by the inclusion of a small resistance along the edge of the emitter, uh, drain
junction.

The Snapback Phenomenon


The snapback phenomenon is a bit different, but the required layout is identical, and in this
case, the MOSFET schematic is correct.
The protection mechanism depends on the Zener avalanche characteristic of the drain
region, exacerbated by the close proximity of the source junction. The mechanism requires
bipolar transistor terminology to explain, but is a fundamental characteristic of the MOSFET.
The mechanism can be described by examining the NMOS device when its drain terminal is
severely stressed by a high positive potential.
When the drain junction enters the avalanche region of operation, the potential gradient
within the depleted region accelerates electrons to a kinetic energy that is sufficient to knock
electrons out of the crystal lattice, creating electron-hole pairs, as shown in Figure 6.5. These
additional carriers cause substrate currents that can increase the substrate potential, locally to
the drain diffusion, and since the source junctions

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 129

Figure 6.5 Depletion region around a back-biased diffusion.

on each side are grounded, the source terminals will act like NPN transistor emitters,
generating electrons as minority carriers into the substrate. As the junctions are very close, the
emitted minority carrier electrons can find their way to the depleted drain region, offering
more electrons to participate in hole-electron pair production; the effect is known as snapback,
which is readily apparent when looking at the I/V plot of the device as shown in Figure 6.6.
As a current is applied to the drain junction, the drain voltage increases rapidly, limited by
leakage of the drain junction, and the finite off current of the NMOS device, which is
operating deep in the subthreshold region. Avalanche begins at about 10 V, and the drain
voltage reflects a Zener-like characteristic. At a certain point, the substrate currents are
sufficient to raise the local substrate potential to the point where the NMOS source begins to
act like an NPN emitter, injecting electrons into the substrate, in extremely close proximity to
the drain depletion region. These additional minority carrier currents incite the snapback
process, where the required drain terminal potential required

Figure 6.6 Snapback characteristic.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 130

to sustain snapback is significantly lower than that required for initiating snapback. As the
currents are increased, the resistance of connecting structures determines the resulting drain
terminal voltage.
This snapback property, in conjunction with the parasitic bipolar mechanism for handling
reverse potentials, allows large transient pad currents to be conducted through metallization,
which has resistivity controlled by design, as opposed to through the substrate, which has poor
conductivity in general.

Minority Carrier Injection to the Substrate


The closeness of the junctions, formed by the use of poly gate as a masking device, allow
these structures to contain high transient currents within the protection structure, and allow the
currents to be carried by metallization to VDD and GND terminals. Both the bipolar as well as
the snapback mechanism, however, do contribute significant currents to the substrate, as both
majority carriers that can directly affect local substrate potentials as well as potentially long-
ranging minority carriers. Good substrate connections in the vicinity of the protection devices
can help keep substrate potentials to reasonable values, but minority carrier injection can be
stopped by an N well that surrounds the offending circuitry (or the sensitive circuitry),
connected to VDD. Minority carriers will either be recombined at the epi interface or trapped
into the biased N well; few will be able to find a path under the well.
Structures such as that should not be required universally, but could be used to advantage to
protect very sensitive circuits from minority carrier interference, as shown in Figure 6.7. Any
logic signal input pad could be subjected to signal undershoot (during normal operation) that
would

Figure 6.7 Minority carrier generation into the substrate from forward biased n diffusion in p
substrate. Well biased to supply captures minority carriers, protecting internal
circuitry from being affected.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 131

briefly turn on the negative protection device, generating a burst of minority carriers. Sensitive
analog circuitry that is placed nearby can be severely affected by such events. Minority carrier
range is generally limited to a few hundred microns in bulk silicon, and a small fraction of that
on epi substrates.

Supply Clamping and Supply Rail Conductivity


In all cases, there may be a need to clamp the VDD supply, so that electrostatic discharge to
the VDD pin does not cause damage. This is accomplished in two possible ways.
The first is the snap back clamp, simply an NMOS device across the supply terminals,
which is shown in Figure C6.1 of the color section.This snapback VDD clamp is built with
features that may not be obvious from the printed layout, so allow me to point out some
important details: First of all, it is not as narrow as it could be, with 10-μ wide strips
connecting the seal ring at the bottom to the GND bus. All features that are not electrically
connected to the pad metal are spaced away by 20 μ. Contacted P diffusion is placed under the
side rails, which extend to the top of the cell; here, GND can be obtained for internal circuits.
The upper M2 bus is 40-μ wide and carries VDD, the one below it is 30-μ wide and carries
GND. The space between the power busses is 5 μ.
The pad is 100 μ2 with an 88 μ pad opening in the overglass, which makes the entire cell
160-μ wide. If all of our I/O cells are built with this frame, the pad pitch will be 160 μ, easily
accommodated at any packaging house. A 4 mm2 project, however, will only be able to
accommodate 22 such I/O pads along each side.
The worst case resistance from the VDD pad to the furthest point on a 4 mm2 part would be
3 Ω, assuming a 0.03 Ω/sq M2 sheet resistivity (rough cut calculation: half way around is
8000/40 or 200; times 0.03 is 6 Ω; both paths in parallel gives 3 Ω). The GND bus is
reinforced by the conductivity of the seal ring, and has a somewhat lower resistance.
The cell used for snapback protection is composed of a single cell arrayed by the layout tool
to occupy six overlapping positions. Each cell is a pair of NMOS devices, each gate is 0.6-μ
long and 63.6-μ wide. Since all devices are in parallel, the total NMOS device is effectively
763.2-μ wide. The gate is spaced from the contacts by 3.8 μ; the effective source or drain
resistance is about 0.6 Ω, which is a trade-off between voltage drop at high current stress and
localized heating, which could lead to thermal runaway. Notice the contacts are positioned
somewhat in from the ends of the NMOS device, since the sharp corners of the diffusions will
likely be the places where hot spots will originate, and poor contacting in these regions
increases series resistance as a damage protection measure. Each cell is surrounded by a
grounded P diffusion ring.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 132

A second kind of supply clamp is used in the more advanced processes, essentially a very
large MOS device that can short the supply to ground if the voltage rate of rise is very fast, as
would be the case during an electrostatic event. This cell is designed to be included
occasionally between pads, and is shown in Figure C6.2 in the color section.
The clamp cell is composed of an array of NMOS capacitors at the bottom, the source/drain
connections of which are made to ground. The gates of the NMOS caps are connected to an
M1 strip that runs up the middle of the cell to a poly resistor at the top. The poly resistor is
connected to VDD. In between, we have an array of PMOS devices with their sources and
well connected to VDD, their drains connected to ground, and their gates connected to the
center M1 line. The total PMOS device width is about 1200 μ. When the rate of rise of VDD
is sharp, the PMOS devices are briefly turned on, clamping the VDD rail to some limiting
potential. The time constant of the MOS capacitance and the pull up resistor is such that a
normal power-on VDD rate of rise will not turn on the PMOS devices.
Although P devices are not as strong as N devices, talk around the campfire is that N
devices, although more conductive, are more prone to damage in this application, and P
devices are preferred. Here’s another example of how rumor and fear drive IC structure ideas
into the realm of the superstitious. In fact, even fabs are reluctant to bear the cost of test masks
and engineering time to fully prove every possible arrangement of parts. The library of I/O
cells you get from your foundry will have been tested, but it may be difficult to get good data
on how well they actually work.
The CLAMP cell can be thrown in anywhere between pads. The terminal between the VDD
and GND bus in the CLAMP cell is labeled “clamp,” which is connected to the device
gates. When numerous CLAMP cells are used around the padframe, they may be all
interconnected with an M2 strip that lies between VDD and GND, causing all of the clamp
devices to activate simultaneously. The technique of using the CLAMP cell becomes more
important as feature dimensions are reduced; 0.6 μ circuits will probably not require them, but
0.35 μ and finer will.

Protected Pad Design


Depending on die size and pad count, your design will be either pad limited, or core limited.
Pad- limited designs will require that pads be placed on a close pitch, with the protection
devices built under wide power and ground busses that surround the core circuitry. Since
standard cell circuits are not built to operate in harsh environments, it is advisable to build
your protection circuits so that while clamping a high current pulse they do not severely
disturb the substrate within the core. This may

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 133

require extra effort in designing guard structures that protect the core from extreme electrical
activity within the protection circuits.
Core-limited designs allow protection circuits to be built between pads, and deliver a greater
spacing margin (under the power and ground busses) between the protection circuits and the
core. Further, core-limited designs may benefit from using the seal ring as an additional
ground bus, with connections from the seal ring to the inner ground bus between bonding pads
and their protection devices.
When designing pad protection for pad-limited designs, the pad pitch will be finer, and you
may wish to use multiple VDD/GND bus runs (2 VDD, 2 GND) and make these busses wide,
so that you have sufficient room for the protection devices themselves and any associated
drive or receiving circuitry. When core signals interface with I/O cells, they should do so
indirectly, through buffers that have better latchup resistance than core cells would have.
Never run a line directly from a pad into the core, unless it is a power signal or an analog
signal that has been very carefully considered.
In extreme environments, you may wish to include a second level of protection on your
signal input pads, which could simply be a resistance in series with the signal coming from the
primary protection devices to smaller secondary protection devices.
I/O cells can often be drawn as basic devices with primary protection in place, leaving a
space where I/O circuitry can be built to interface with core signals. Each I/O cell will then be
simply the basic cell with added circuitry to cause specific functions like Schmitt input, tristate
output, and so forth. Although significant amounts of circuitry can be built into an I/O cell,
like registering and complex gate functions, I suggest building a minimal set of cells and
building more customized ones only as the need arises.
The PAD_SCHMITT_TS is perhaps the universal pad for logic signals, the layout of which
can be found in Figure C6.3 of the color section. Details of the input and output control are
found (also in the color section) as Figures C6.4 and C6.5. The schematic of the
PAD_SCHMITT_ TS pad is shown in Figure 6.8.
The only signals that enter your design are analog signals or logic signals. Logic signals
should be sent through a Schmitt trigger on the way, or false data edges will be transferred to
the core due to substrate noise. The standard cell library does not contain a Schmitt cell, since
analog signals that may need to be cleaned up should never be sent to an autorouted layout—it
will be too noisy an environment; clean up your analog signals in a quiet environment before
you send them to the core as logic signals. You may ignore this suggestion if you like for data
signals at an I/O pad, but NEVER send a clock signal into an IC without a Schmitt trigger in
the I/O pad signal path. Ahem, NEVER!

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 134

Figure 6.8 Schematic of the PAD_SCHMITT_TS cell.

The PAD_SCHMITT_TS is an input pad and a tristate output pad. Y is always the Schmitt-
buffered logic state of the pad. A is the signal that is transferred to the pad when the tristate
enable E is high. The cell can be programmed to be a simple output pad by tying E high, or as
an input-only pad by tying E low and A to either logic potential. The signal from the pad is
conducted through a 2.4K Ω poly resistor to an N and a P diode structure (upper right corner)
prior to entering the Schmitt trigger circuit. Poly is also used for some of the wiring in the
Schmitt trigger, as a two metal process makes wiring in tight spaces difficult.

Low RFI Pad Design


The N and P devices that drive the pad are the protection devices. The N device is driven by a
NOR gate that accepts A and not E (from an inverter), and the P device is driven by a NAND
gate that accepts A and E. The gates are specially scaled so that the driven devices turn off
abruptly, but turn on rather slowly. This is so that both output devices are never on
simultaneously and also so that RF interference resulting from sharp pad transitions can be
minimized.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 135

On this point, when you design your own I/O cells, you have the ability to make them
behave in any way you like. Many libraries will contain I/O cells that drive the output devices
really hard, and although they will respond quickly, they can cause problems during RFI
testing. With careful attention to I/O pad slew rates, you can design ICs that are extremely
quiet in the screen room.
I once built a signal processor with internal MOSFET supply bypassing and carefully
designed I/O pads. An internal PLL took a low frequency input and produced a 50 MHz
internal process clock, driving several memories and a fast arithmetic unit. In the final
product, a standard microcontroller was used for handling a simple user interface. The product
showed virtually no RFI from the DSP chip, but the microcontroller had to be dealt with
through extra bypassing and resistors inserted into its I/O lines. This is just one example of
how you can do extraordinary things when you understand each aspect of your design.
You may need very high speed at the pads, and for this you must be prepared to use the IC
in a careful PCB layout. High speed pins can be very high speed (sub-nanosecond), but will
require frequent power and ground pins to supply the required transient currents. It is not at all
uncommon to place a VDD pad, two signal pads, a GND pad, two signal pads, a VDD pad,
and so on along one side of a part that is to communicate at high speed with an external device
(such as an SRAM).
Conversely, microcontroller interfaces can easily get by with pad delay and rise/fall times
on the order of hundreds of nanoseconds. Do not be afraid of scaling your pad output device
drivers to what would be considered ridiculous in a commercial part; if your design can
benefit from any deviation from “normal,” do it.
All padframe cells will not only conduct signals safely into and out of the core, but will also
conduct power and ground. In the examples I’ve given, the M2 rail on the core side is VDD,
so internal circuitry can access this directly. The M2 ground bus, however, is unavailable to
the core. Notice that the GND signal is carried between the I/O cells to the core edge by M1,
and is labeled with a GND pin on the cell’s upper edge.
A final note on the example PAD_SCHMITT_TS cell—although it will perform very well
in commercial products, the protection circuits are a bit small for super-robust applications. If
a very harsh operating environment is expected, the protection devices should be, perhaps,
twice as large, and the space to the core circuitry would also need to be increased. I’ve used
a 40-μ VDD bus and a 30-μ GND bus, which may need to be extended to 50 μ and 40 μ, with
an additional pair of smaller GND and VDD busses (perhaps 20 μ each) on the core side to
power the control circuitry.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PERIPHERAL CIRCUITS Keith Barr 136

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
Source : ASIC Design in the Silicon Sandbox Keith Barr 137

Specialty Logic Structures and Memory


So far, we have looked at all kinds of IC structures, but every one of them is probably
available from your foundry as standard cells and I/O pads, neatly designed and
characterized for your convenience. If you stop reading at this point, you may have
learned a bit about the IC industry, a few useful tips here and there, maybe even a better
understanding of how the library was built, but you will have missed the best part: those
things that aren’t in a foundry’s library.
In fact, I only suggest building your own library as an exercise in circuit design and
layout, getting familiar with your tools, and ultimately knowing why your library is the
way it is. Accepting a foundry’s library without full knowledge of its development
leaves you manipulating mysterious objects that cannot be modified without some degree
of apprehension and producing your own cells will remain a distant dream.
The really cool part about custom IC development is not that you can take a netlist and
make an entire chip by pressing a few buttons (although that is pretty neat…), instead, it
is that you can develop whole processing machines in extremely tight spaces that run at
potentially blazing rates; and analog… Well, we’ll leave that ‘till later.
Right now, I’d like to introduce you to ways that custom cells can be designed, not
ones to be used in autoroutes but ones that don’t need to abide by specific rail heights or
well locations; in other words, cells that can be arrayed to do special functions efficiently.
The ultimate extension of this approach is memory design. There are just two examples
I’ll give, and I will only describe them in words. Not all design challenges are this
elegant, sometimes a big rat’s nest of logic is the only way to get a job done, but many
applications can be reduced to high speed arrays that

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 138

are simple, elegant, and, often, against the first-cut approach that a designer may take.
Learning to create and array custom cells gives you a power that is unrivaled in the
electronics design field.
The first example is of a chip for accumulating a serial data stream into successive
memory bins (a binary correlation process), which had to run at a rate that made the use
of a single SRAM impossible; the time required to access a memory value, read it out,
add the input bit, and write the value back was much longer than the incoming bit period,
even in 0.13-μ technology. Further, the system was to be used in an environment where
any electrical interference produced by the design had to be at the bit rate only; lower
frequency interference would be easily picked up by the sensitive analog input circuitry.
The solution was to make a custom resettable ripple counter cell with a tristate output,
and array it into a huge block. After resetting, the array could accumulate input bits into
successive bins extremely quickly, despite the fact that a ripple counter is well understood
to be a slow function; it can, however, accept clock rates that exceed that of more
complicated synchronous counters. Careful layout of power and data lines to the counter
array allowed an IC to exist, which couldn’t have been done through any other simple
means. I shudder at the thought of the schematic being autorouted: how large the result
would have been and how poorly the timing would have been defined. The part worked
really well in a 0.6-μ technology. The cell was made much smaller than a reset flip-flop
from a standard cell library, with interconnection ports on all sides so that abutting cells
would connect power and signals after being automatically arrayed by the layout tool.
The second example involves digital filter functions—particularly the FIR filter, used
in interpolators and decimators (polyphase filters)—that usually require a multiplier and a
coefficient ROM, and multiple registers, either at the input of the function, or as
accumulators for the function’s output. Custom cells can be made to array into a
rectangular space, consisting of multipliers and adders that abut to the memory terminals
directly, forming an extremely dense and fast logic block.
For this purpose, imagine a register bank of 256 registers, each 24 bits wide, and each
register with a tristate output. This can act as an accumulator of products from a
multiplier through an adder, and constitutes approximately 50K gates. The clock line for a
given register would enable a register output when low, delivering its stored value to the
adder, and then clock in the new summed data to the same register on the clock line’s
rising edge. This may be a bit clever, but it’s really simple and straightforward.
Now imagine that same basic function being done as 257 by 24 bit tristate latches,
each of which occupies about half the space of a full register, where the output of one
latch is conducted to the accumulator adder

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 139

while a neighboring register simultaneously accepts the adder’s result. The structure
becomes a circle of storage devices, as the updated information is always moved to the
neighbor, hence, 257 latches, not 256. Further, since such processes are continuous as a signal
processing function, the latches can be dynamic, taking even less space than the static latch.
The twisted logic of the process, with data essentially moving through the circular array means
that as the process continues, it does require some thought in planning, but the result is
extremely small compared to the more straightforward register design, which, in turn, is
smaller than an autorouted design due to the arraying of cells. The careful planning of signal
processing architectures can reveal novel results that are small, consume little power, and are
fast. Only when you create the parts yourself can you take full advantage of your own
cleverness.
Once you allow yourself the luxury to imagine custom IC cells, take a look at the products
you’re currently working on, or ones that your company has a potential market for. Ideas
will come to mind. Once you’ve learned the development tools, your first move may be to
start a special cell layout, testing the possibilities. Only later will you document it in a
schematic. At this point, you’re squarely in the center of the sandbox, hard at play.
To better understand the issues and techniques of arraying custom cells, we’ll look to
memory design.

Custom Memories
Memories are built as an array of core cells, supported by circuitry that abuts to the core; the
block level layout is shown in Figure 7.1.

Figure 7.1 Block view of memory layouts.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 140

The wordline decoder responds to some of the address inputs and accordingly selects one of
many wordlines that span across the core array, enabling a single row of memory cells. The
column processing block responds to the remaining address lines, selecting vertical bitlines
that also pass through the array, arranged in columns. The corner is reserved for timing and
coordination of the incoming address lines. All data is transferred through the column
processing block, with I/O terminals along its bottom edge.
The memory may be divided into vertical bit sections, where all bit sections are identical. A
20-bit wide memory would contain 20 vertical bit sections, and each section would have
numerous vertical bitlines within. Within a bit section, the number of bitlines is usually a 2^N
value, like 4, 8, 16, 32, and so on. The number of wordlines is also usually a 2^N value. This
is so that all combinations of a group of addresses (one group controlling rows, the remainder
controlling columns) will have unique meaning. Addresses and control signals are connected
to the memory at the corner block.
You can build your memories as wide or as tall as you wish, so they may abut to arrayed
processing circuitry directly, or fit into a remaining corner of your core design and be
interconnected via a bus of wires. You can also control how fast the device is, which will
affect peak power consumption; fast designs will require very sharp supply and ground
currents when accessed, and this may cause problems in ASICs, which also include sensitive
analog circuitry. Usually, the memories available from a foundry (often at additional cost) are
of the highest speed variety and don’t allow a speed/noise trade-off. When you build your
own memories, you have complete control over speed/power issues.

The Memory Core: SRAM


Each bit section in the core of an SRAM is composed of an array of six transistor cells, I’ve
named SRAM_CELL, shown schematically in Figure 7.2, with the layout in the color section
as Figure C7.1. The cell is fully differential, and can be understood as two minimum-sized
inverters in a feedback loop. m1 and m3 drive the inputs of m2 and m4, which, in turn, drive
m1 and m3 inputs. Therefore, the cell can be stable and draw no supply current with the drain
of m3 low while the drain of m4 is high, or vice versa. The access transistors m5 and m6
connect the cell to the differential bitlines BL and BLN when the wordline W is high. Through
these bitlines, the cell can be read out or written into, depending on how the bitlines are
manipulated by the column processing circuitry.
The SRAM cell layout shows four vertical M2 lines, which are GND, BL, BLN, and GND
again. The upper, horizontal M1 line is VDD. The vias at the bottom connect m5 and m6 to
the bitlines.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 141

Figure 7.2 SRAM cell schematic.

This cell is intended to be flipped vertically and overlaid with another SRAM_CELL to
form a cell named SRAM_CELL2, which is shown as Figure C7.2 in the color section.
This pair of cells can now be overlaid side to side and top and bottom into an array. The
array looks incomprehensible if the component parts are not understood.
Figure C7.3 of the color section shows a part of an array of these cells. The vertical M2
GND lines overlap side to side, and the M1 VDD lines overlap top and bottom. Notice that the
cells could be packed closer if Vias were allowed to be stacked directly upon contacts, and the
height of the SRAM_CELL2 would be somewhat reduced.
Notice also that the poly wordlines running across the SRAM_CELL are connected to M1
strips that run parallel to the poly lines. The M1 lines are not always necessary, as poly is a
reasonable conductor, and can pass the wordline control signal across the array, but the
capacitive loading of the access transistor gates will cause wide arrays to have slow signal
propagation along the wordline. So that the wordlines may respond quickly all the way to their
furthest ends, this technique of metal backing is employed, and in this case no increase in cell
dimension is required to do so. In cases where space is not available, metal-backed poly
wordlines may not be possible, forcing the design to either run at a slower pace; or in cases
where speed is a factor, the width of the array can be made intentionally short. In some arrays,
high speed and large size mandate that cell dimensions be increased to accommodate the
metal-backed wordlines.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 142

Notice also that the PMOS devices are within a well that is strongly connected to VDD by
N implanted active, under the M1 VDD rail. The NMOS devices, however, do not have any
substrate contacting in the cell, as the space to do so does not exist. We have two options here;
either make the cell itself larger to accommodate substrate contacts, which would increase
both width and height, or rely on the excellent contacting within the well to suppress the
tendency toward latchup. If we choose the second option, we must make sure that good
substrate contacting is provided on the sides of each bit column array, between bit columns.
The use of epi substrates will allow greater width of the bit columns; fabricating in bulk
silicon (non-epi), the greatest distance between an N device and a substrate contact should be
no more than perhaps 20 μ. In epi, considerable distances can be tolerated, provided any
VDD-connected P devices in the area have excellent N well contacting.
An array is generated by instantiating a single SRAM_CELL2 cell, and then editing it to
array it into any number of horizontal and vertical copies with suitable vertical and horizontal
pitch dimensions; a section of such an array is shown in Figure C7.3.
A single bit section, that is, a grouping of cells that will correspond to a single I/O bit, can
be delineated from adjacent bit sections with a cell that can perform substrate contacting while
passing the wordline signals from bit section to bit section; I call this cell SRAM_TIE, which
is shown in the color section as Figure C7.4.
SRAM_TIE is intended to be arrayed to the bit section’s height, and connects the
horizontal M1 VDD lines vertically through an M2 strip; this places VDD throughout the
array into a grid. This VDD signal can also be passed by the M2 strip into the column
processing circuitry at the bottom of the memory.
GND is obtained from the closest vertical M2 within the SRAM_ CELL2 array, and is used
to contact substrate with P implanted active under the SRAM_TIE cell.
M1 strips carry the wordlines across the SRAM_TIE cell.
A block of memory, 1-bit wide can now be built. We’ll make an SRAM bit section that is
four SRAM_CELL2 cells wide and four cells high as an example. This small block only
contains 32 bits of data, but such small SRAMs are useful; larger ones can be built using the
same basic techniques, but would be difficult to see on a printed page. We’ll call this cell
SRAM_BIT_ARRAY.
The SRAM_TIE cells are arrayed up the right-hand side, and a copy of that array is
mirrored to fit on the left-hand side. This allows the SRAM_BIT_ARRAY cell to be arrayed
into a memory that is N bits wide, at a pitch that allows the center vertical M2 strips of the
SRAM_TIE to overlap exactly. This is shown in the color section as Figure C7.5.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 143

Actually, we will attach the I/O circuitry to the bottom of SRAM_ BIT_ARRAY before
arraying it into our full SRAM, because the I/O circuitry is identical in each bit section. This
will constitute a single bit of memory that can be simulated in SPICE; depending on memory
size, it may be difficult to run SPICE on the entire array.

The Memory I/O Section


The bottom of the cell SRAM_BIT_ARRAY will need a multiplexer that will select which
sets of bitlines are to be accessed, circuitry that allows the reading from or writing to the
selected bitlines, and, finally, we must precharge the bitlines before we can properly access the
memory.
The NMOS devices within the cell that act as access devices, m5 and m6 in our cell
schematic, are turned on by bringing the wordline to VDD, allowing access to an entire row of
cells across the memory array, connecting the internal signals of each accessed cell to their
respective vertical bitlines. Unfortunately, a single NMOS device is not an ideal switch; with
the gate at VDD, the device will cease to conduct when the source and drain terminals are near
VDD, and will only conduct well when source or drain is near ground. Therefore, the access
devices can pull a bitline to GND, but cannot pull a bitline any higher than VDD-Vt, which in
a 5 V system may be only 3.2 V (due to the body effect). Further, if the bitlines are at arbitrary
potentials when a row is accessed, the contents of the accessed cells in the row may be
corrupted; the strength of the devices within the cell may not be sufficient to overcome the
charge on the bitlines. To overcome this problem, all of the bitlines in the entire memory are
precharged to VDD before a wordline is allowed to access a row of memory.
Precharging is accomplished by PMOS devices attached to each bitline that can pull the
bitlines to VDD. I call the control signal for this row of precharge devices PCHN, meaning
precharge-not. Bringing PCHN low will turn on the precharge devices, bringing all of the
bitlines in the array to VDD. PCHN only needs to go true (low) for a short period to
accomplish the precharge operation.
After the bitlines are precharged, access may performed without disturbing the cell’s logic
state, because the cell is a balanced, differential circuit, and the strength of the access devices
is limited. Close inspection of the access transistors in the SRAM_CELL layout will show that
m5 and m6 are a bit smaller than m3 and m4. When we write to the accessed cell, we will
intentionally force one bitline high and the other low, which will overpower the internal
devices within the cell, forcing a new condition; the access devices must be strong enough to
convey this forcing condition, but not so strong as to erase the cell’s data when a cell is
accessed with both bitlines at the precharge potential (VDD).

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 144

The dimensions for m5 and m6 are fairly noncritical in this regard. In the layout, they are only
slightly smaller than m3 and m4, only so that the poly to active DRC rule is not violated.
We can now begin to arrange devices at the lower edge of the SRAM_BIT_ARRAY cell
that will perform precharge, column selection, and I/O operations. The layout of this block is
shown in the color section as Figure C7.6.
I call this cell SRAM_BIT_OUT, but it could have been simply drawn at the bottom of (and
within) the SRAM_BIT ARRAY cell. As a cell, it can be instanced into the
SRAM_BIT_ARRAY cell. Drawing it as a separate cell makes it easier to draw and edit.
There are lots of things to say about this cell, but because the layout may be hard to follow,
let’s refer to a schematic instead (shown in Figure 7.3).
The precharge devices are along the top. Below the precharge devices are the NMOS mux
devices that connect the bitline pairs, depending on address line combinations, to the write
circuit on the lower left, and the output circuit on the lower right.
The four PMOS devices above the write circuit ensure that the output of the mux actually
does pull to supply, despite the fact that a VDD level

Figure 7.3 Schematic of SRAM I/O block.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 145

signal cannot transfer through an NMOS device. Two of the PMOS devices precharge the mux
output, driven by PCHN, the other two constitute a cross-coupled pair. The mux output is, of
course, differential, and when one bitline is pulled low through the mux, the other is brought
to full VDD potential by the PMOS cross-coupled pair. Series N devices force the D input
signal (and its complement from the input inverter) to the mux terminals during write,
whenever WR is high.
At the output section, the differential mux output goes to a pair of cross-coupled NAND
gates, organized as a false input, set-reset flip-flop. During precharge, when both mux outputs
are pulled high, the SR flip-flop retains its previous state. During read or write, one of the mux
outputs will fall, setting the NAND gates into the correct state for output through the output
buffer at the lower left corner, producing Q.
I drew this layout from knowledge of the requirements, because I’ve done it so many
times before; only after making the layout did I draw the schematic, followed by an LVS, to
make sure that what I intended in the layout had a proper schematic. If you do the schematic
first, you don’t have knowledge that it can be laid out efficiently. When you do the layout
first, you know everything fits nicely. Sandbox designs benefit from the layout engineer also
being the design engineer; designing a schematic in the vacuum of a schematic tool, and then
tossing the schematic over the wall to a layout engineer is inefficient.
IMPORTANT NOTE: This is just one of many possible ways to organize an I/O section of
an SRAM. The basic functions will always be to precharge the array, select a column with a
mux, read the data out, and write data back as well. Although precharge is fairly obvious, all
other functions can be done in any number of different ways. You could have a D flip-flop at
the output, registering the data with a clock signal, or perhaps a simple output inverter, driven
directly from one of the mux outputs. The mux could be single pass transistors driven by gates
that decode the address signals, which could be advantageous if the number of bit pairs
coming from the array is large. The write circuitry could be a single pair of strong N devices
driven by logic, perhaps providing more strength, writing to the array somewhat quicker. It’s
all up to the designer.
Notice the order of the address lines coming into the side of the SRAM_BIT_OUT block.
From top to bottom they read A0, A0N, A1N, and A1. This may seem odd, but prior
knowledge of the way these lines will be conveniently driven from the corner section of the
SRAM makes this the best arrangement. When you do your first design, expect that every
block you build will require modification due to issues that arise in connecting blocks.
This cell, SRAM_BIT_OUT, can now be instanced into the bottom of the
SRAM_BIT_ARRAY cell, along with another cell I’ve called SRAM_BIT_TOP, which we
can place along the top, as shown in Figure C7.7 in the color section.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 146

This cell will connect the vertical M2 GND strips from the array and conduct currents
across the array. One weakness in this design is that GND is conducted very nicely vertically,
but very poorly horizontally; if the memory is very wide, we should consider a wider M1 strip
for SRAM_BIT_TOP, and it would appear that M2 could also be placed onto the
SRAM_BIT_TOP cell to provide lower GND resistance. You may wish to use a wider M1
strip to carry ground, and a narrower M2 strip across the top that connects to the VDD lines
that also come up through the array; this would allow the memory to be powered from
anywhere along its upper edge.
GND is also passed horizontally in the SRAM_BIT_OUT cell, and these M1 strips could be
made wider at the expense of increasing the cell height. The use of three metal layers very
much improves the ability to make good VDD and GND connections in memory design, as we
can see that the two-metal process is capable of a dense layout; it is only deficient in VDD and
GND conduction.

The Wordline Decoder


To the left of the memory is the wordline decoder section that will accept (in this case) three
address signals (and their complements) and an enable signal, and exclusively pull one
wordline high for each possible address input combination. It is important to note here that
although you may want your design to have a certain correctness in terms of which memory
cell corresponds to which address combination, this is not necessary. For a ROM, we need this
certainty for sure, but for an SRAM (or DRAM) such correctness is not required; all we need
to know is that what is written to an addressed location can be retrieved later at that same
location. The address lines could be swapped to the SRAM, and the system would continue to
function properly.
Before detailing the wordline decoders and drivers, a brief note on dynamic circuits is
required. We think of static RAMs as being fully static devices; as components, we apply an
address and get out (or write in) data. In fact, commercial SRAMs are dynamic inside. While
the address lines are stable, the output data is stable, but when an address line changes state,
internal circuitry detects this (with XOR gates and delays) and an access cycle is initiated. The
access cycle consists of disabling the last addressed wordline, then briefly precharging the
array, and then reenabling the new wordline condition, producing an output. The memory
appears static to the user, but is in fact dynamic in its operation. The timing delays that control
precharge duration and the nonoverlapping of events are built into the device at the design
level. Notice that static RAMs draw trivial power while idling with fixed addresses, but once
just one address line changes state, the entire part draws a severe spike of VDD current.
For memories within an IC, we have the ability to pass clocking information to the memory,
along with addresses, where the timing of the two

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 147

Figure 7.4 Typical timing diagram for SRAM signals.

is derived by our system’s logic circuitry. ASIC memories are clocked devices. I like to
design my memories so that the memory clock is low to access data, and while high, the array
is precharged. Of course, the wordlines are all driven low during precharge, so the memory
cells are not “fighting” the precharge condition. During precharge, while the clock is high,
the address conditions may change, but while clock is low, the system circuitry must ensure
that the addresses are stable. Also, the write control line is only allowed to go true while clock
is low. When you design your SRAM, you will specify these parameters in the Verilog model,
so that any logic conflicts from your interconnected circuitry will be revealed.
A typical timing diagram is shown in Figure 7.4.
The CLK signal is passed to the memory and the corner circuitry develops PCHN (active
low) and WORD (active high) as nonoverlapping signals. For an SRAM of this size, in 0.6-μ
CMOS, the access time from the falling edge of CLK to accessed data appearing at the Q
output terminal can be as short as a few nanoseconds, depending on internal nonoverlap
timing. More relaxed timing is usually acceptable.
The dynamic nature of the SRAM can be carried into the address decoder/wordline driver,
but only if your system will be cycling the memory at some minimum rate. Consider a driving
circuit, like the one in Figure 7.5.

Figure 7.5 Wordline driver schematic.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 148

This is the dynamic approach to a wordline decoder/driver. M2 and m3 drive the wordline,
and the speed of the array can be affected by the sizing of these devices, their output current
driving a potentially large wordline capacitance. The driver is precharged by m1 when the
control line WORD is low, which also turns off m7, breaking the path from the driver input to
ground. SA0, SA1, and SA2 are connected to three address lines or their complements; these
connections are made differently for each wordline driver cell so that each wordline responds
to a unique address. Only the wordline driver that satisfies the requirement of all three series
NMOS devices being on will produces a wordline output that is high. Eight possible wordlines
can be driven with a 3-bit input code; most memories will be larger, with many series NMOS
devices to decode the input addresses to numerous wordline outputs.
This technique works perfectly for continuously accessed SRAMs, but if the cycling
process is ever stopped while the memory is in access mode, with m1 off and m7 on, leakage
currents through m4, m5, and m6 will slowly pull down on the driver input, and a wordline
that is not properly addressed to do so can go high at room temperature. In systems that will
require a stopped clock, the clock must only stop while the SRAM is in precharge mode (with
WORD low), or wordlines will ultimately go high throughout the array, corrupting data
throughout. Typical time constants for this leakage (and memory corruption) are on the order
of several hundred microseconds at room temperature.
A fully static wordline decoder overcomes these problems, making the SRAM more
“static” in nature. A schematic is shown in Figure 7.6.
This is essentially a four input NAND gate driving the wordline driver inverter. It is fully
static, but consumes more space, especially if a large number of wordlines are in the memory
array. Further, the construction of a nine input NAND gate on each wordline driver to service
a 256 wordline array would probably be greater than the wordline pitch, making the layout
very difficult. A more reasonable solution, one that will be needed in other memories (like
ROMS and DRAMS) that have an even tighter wordline pitch, is one that divides the function
across several adjacent wordline drivers, as shown in Figure 7.7.
I’ve drawn this schematic as gates for convenience, but of course, the actual layout could
be fully custom, not necessarily from your standard cell library.
This allows greater space for the decoding gate, provided that the wordline pitch allows for
a three input NAND gate at each wordline driver. Alternatively, the NAND3 can be made a
NAND2, with the NOR7 becoming a NOR8 with the complement of WORD (WORDN)
driving the extra input. Also, four wordlines can be gathered by this technique,

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 149

Figure 7.6 Fully static wordline driver.

allowing more space for the primary decoder. Further, the lower address lines can be decoded
within the corner cell, sending, say, four decoded lines up the wordline driver array to simplify
decoding at each individual wordline driver.
In very tight memories, such as ROMs, the dynamic technique is used instead, and the
surrounding system is prohibited from freezing the clock in the wrong state, or the data is
registered at the ROM output so that drifting wordlines do not cause problems. For SRAMs, I
suggest the

Figure 7.7 Alternative schematic representation of static wordline driver.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 150

dynamic decoder approach, with special provision within the system to only pause the
memory clock while it is high (and therefore in precharge mode).
An alternative solution to the “floating wordline drive” problem is to place a very weak
PMOS device into the driver, with it’s gate tied to the wordline, it’s source at VDD, and
it’s drain at the wordline driver inverter’s input; this will ensure that leakage will not cause
faults, but the device’s dimensions must be carefully determined by SPICE simulation to
ensure that the series N device decoder can properly (and quickly) overcome the added P
device’s drain current during each access.

The Control “Corner”


The only requirement in terms of nonoverlapping of WORD and PCHN is that the PMOS
precharging devices be fully off by the time a wordline begins to rise, and that all wordlines
are low when the precharge devices begin to turn on; otherwise, a large current can be drawn
through the array. Since such currents are necessary when precharging the bitlines, a process
that may take a few nanoseconds, some residual overlap between PCHN and word is
acceptable, but will increase the power consumption of the SRAM. The timing of the address
signals though is critical, with transitions of the address signals occurring well within the
period during which WORD is solidly off; otherwise, data corruption within the array can
occur. Usually, this is accomplished by clocking the address values into the SRAM using a
timing edge that is centered on the CLK high period. In most systems, the CLK to an SRAM
will be of a lower frequency than that of the system, and such timing can be easily developed
to register addresses. Finally, the WR signal must be only able to go true while memory cells
are accessed. In some cases, you may wish to modify your I/O circuit so that write is inhibited
during precharge by adding an additional poly gate in series with the two that currently exist in
each leg of the NMOS differential path of the SRAM_BIT_OUT write circuit, tied to PCHN.
Alternatively, such gating can be included into the corner block that accepts the write signal as
an input and drives the WR line into the I/O cells. Of course, the simplest solution is to specify
in the logic circuit that uses the memory to only allow the write command signal at the proper
time.
Referring to the memory timing diagram, PCHN and WORD are both derived from CLK as
delayed and inverted versions. This can be done by a series of three inverters in each case,
with their devices sized to produce nonoverlapping delays. The devices will be drawn after a
simple

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 151

Figure 7.8 Structured delay circuits for generating nonoverlapping control signals.

SPICE simulation as a guide to device sizing. An example is shown in Figure 7.8.


The imbalance of device sizes causes the output signals, PCHN and WORD, when each is
terminated with a 1 pF load capacitance (a large memory array) to have nonoverlapping
response. The delay from CLK rising edge is about 1 ns for WORD to turn off (low), and
about 4 nS for PCHN to go on (low). At the falling edge of CLK, PCHN goes off (high) after
about 1 ns, and WORD goes on (high) after about 4 ns. Most applications will be clocking this
memory at a fairly slow rate, up 50 MHz, and this timing is totally adequate. For higher speed
memories, the delays can be made shorter by device sizing.
Address lines should be buffered, so that the capacitive loading to the wordline decoders is
as light as possible. This may not be needed for small memories where internal decoding
devices are few, but certainly the address lines are required to be fed into the decoder as both
true and complement values, so at least a single inverter is required on each address line,
which will also be built into the corner. A typical corner schematic for our small memory is
shown in Figure 7.9.
I have drawn the address inverters in this fashion to correspond better to the layout; a
typical inverter section to buffer addresses into

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 152

Figure 7.9 Typical corner schematic. Delivers signals to wordline driver and I/O sections.

the address decoder or the I/O mux section could look like the layout shown in the color
section as Figure C7.8.
The device sizes used in the address inverter/buffers will depend on the speed you need and
the capacitive loading of the devices they drive. Usually, very small buffers (even smaller than
that shown) will suffice for routine memories that cycle at low rates (terms of MHz. I’ve
used poly to carry the inverted address signals across the second inverters to the block’s
output; the resistance of this short poly run is small compared to the inverter output resistance,
especially in a silicide process.
If you work hard on making your I/O circuit blocks really small, and the address
decoder/drivers too, you may find that the corner isn’t large enough to fit the buffers.
Planning the entire memory may require a bit of layout in each block to gain an appreciation
of how the entire memory can be optimized.

Read Only Memories (ROM)


A ROM cell layout, consisting of two rows and eight columns, is shown in the color section as
Figure C7.9.
This design uses M1 for vertical bitlines, but if we had the ability to stack VIAs onto
contacts, we would use M2 for vertical bitlines.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 153

The cell shown is for 16 bits of data. Notice that the second bit from the left, on the upper side
of the diffusion contacts, is programmed. The programming is simply a block of active placed
onto the gap between diffusions, creating a transistor. The upper and lower active areas
constitute the source regions of potential programmed NMOS transistors, and since the
resistivity of Nact is considerable (approx 120 Ω/sq), the length of the span between ground
contacts should be limited in fast designs. The vertical M1 at each end of the cell carries GND.
The ROM operation is very simple: Bitlines are precharged with PMOS devices in the
output section at the base of the ROM design. When a wordline goes high, any programmed
active along that wordline will cause the corresponding bitline to fall toward GND, as the
programming forms an NMOS transistor that will conduct. Unprogrammed locations will
remain high, but only due to charged bitline capacitances.
The poly wordlines are backed by M2 running above the poly horizontally. When this cell is
arrayed, occasional breaks can be inserted into the array columns so that poly can be contacted
to the M2 backing. However, a few calculations can enable us to determine the effects of such
backing, which is illustrative.
Each potential transistor in our ROM is 0.6-μ long and 1.2-μ wide. The capacitance of this
gate area is approximately 1.7 fF. This block is 8 bitlines wide, and such blocks can be placed
on 18 μ centers. A ROM that outputs a data bit from 2 such 8 bitline-wide blocks, and outputs
a 24-bit word in total, will be 48 ROM cells wide, or 864-μ. If all locations are programmed
along a wordline (worst case), we will have a load of 384 transistors, or about 652 fF of load.
An 864-μ long strip of 0.6-μ wide poly will have an end-to-end resistance of about 57.6 K Ω.
Since the capacitance is distributed along the line, the time constant of a changing signal at the
far end of the poly line will be about one-half that of the calculated time constant of overall
resistance and total capacitance. In our case, this calculates to about 19 nS.
This may be OK for slow designs, but if the data needs to be read out more rapidly, consider
connecting the poly to the metal backing only at each end of the array: The greatest delay will
now be to the center of the wordline, as poly is well contacted at each end. In this case,
however, the resistance from either end to the center is half what it was end to end, and since
we now have two such paths in parallel, the resistance to the center is only 14.4 K Ω. The time
constant is now on the order of 5 ns, which may be fast enough for our project. If not, we can
always split the array into a right half and a left half, and contact the poly strip to the M2
backing between the halves. Under these conditions, each poly strip has half the resistance-to-
center that it had in the unsplit design,

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 154

and the capacitive loading on each side is half of the original, giving us a time constant of a bit
over 1 ns. That should be fast enough for any design.
In any case, if you do design the ROM with long wordline time constants, beware that the
wordline must be completely to ground before applying precharge. It may be wise to construct
your design so that wordline signal propagation is fast, even in a slow application, just so that
long precharge delays are not required. Such delays will have a tolerance based on process,
supply voltage and temperature, and will need to work well under all conditions.
Since ROM arrays are only composed of NMOS devices, latchup is not a problem within
the array. We do have a small amount of substrate contacting within the array, but it is not
robust. I suggest surrounding the array with good substrate contacts to prevent latchup due to
other circuits nearby, outside the ROM array.
A group of ROM blocks is shown in the color section as Figure C7.10.
The wordline pitch is only 2.2 μ, which will make wordline decoding and driving a
challenge. Such wordline drivers are probably best built to handle four wordlines at a time.
The wordline driver, with rather small devices, is shown to illustrate one possible routing
solution, as shown in Figure C7.11 in the color section.
Once again, the inability to place VIAs on contacts makes the termination of wordline poly
to M2 a bit messy (on the right side). Notice how the devices are stacked with poly routing,
just to get the wordline driver function to fit the rather narrow wordline pitch. The decoder
will not be detailed, but it is easy to see how it will be a challenge. Making good layouts that
are dense and efficient is a great job for the compulsive puzzle solver.
The output circuits for the ROM can be as simple as an NMOS mux to select one of the
many bitlines. The ROM, however, unlike the SRAM, is purely dynamic; even though a
bitline does not fall due to an unprogrammed cell at that bitline/wordline intersection, it
eventually will fall, due to NMOS device leakage of unaccessed devices working against the
bitline capacitance. The bushold function (shown earlier in the standard cell section) is simply
two weak inverters in a positive feedback loop that can be used at the ROM mux output to
hold the logical state, resisting the effects of leakage within the cell array. Be sure to use
SPICE to simulate the bushold function, to make sure the access of a cell, through the
resistance of the mux devices, can “flip” the bushold when a programmed cell is read out.
Further, be sure to precharge the post-mux signal node.
This suggests that a ROM may be operated by only precharging the selected column, post-
mux, which can be done, provided the address lines to the mux are stable for a good period
prior to precharge turning off. This can reduce the dynamic power consumption of the ROM
and somewhat simplify its design.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 155

ROMs are problematic during verification. Although other memories can have full
schematics that will allow extraction and LVS checking, the ROM, when programmed,
cannot; that is, unless you chose to draw a schematic that details every instance of a
programmed cell. I suggest creating the ROM cell with Nimp surrounding all possible
programming active positions, and actually placing the program rectangles on a separate layer,
which may be called ACT_ROM. This layer will not be recognized by the extraction tool and
will therefore deliver a netlist of an unprogrammed ROM that can be checked against an
unprogrammed schematic. If the layer ACT_ROM is set to tape out on the same GDSII layer
as ACT, the programming will end up in the final ACT mask layer. It is imperative therefore,
that you check the programming thoroughly by tracing many addresses and looking up the
proper bit condition, making sure your ROM programming is done correctly.

Dynamic Memory (DRAM)


Every book on IC design tells you about DRAM, suggesting it can be designed into a common
ASIC. No, it cannot. At least not the way most texts describe DRAM. Commercial DRAM is
built on very special processes that include features not found in a generic CMOS process. The
commercial DRAM is an array of NMOS transistors that couple capacitors to the bitlines
when accessed by the wordlines. The capacitor’s charge can be forced during write, and
during read a slight disturbance of the bitline potential will indicate whether the bit was a 1 or
a 0, and that value is then solidly written back to the cell. This is the classic single-transistor
DRAM design. The capacitor is a special structure, usually a deep hole etched into a doped
substrate, within which insulation is grown; a subsequent deposition of a conductor within the
hole provides a 3D capacitor structure. A higher capacitance value in the cell results in greater
disturbance to bitline potential when the cell is accessed, and allows more cells to be attached
to a bitline (which also increases bitline load capacitance). In commercial DRAM, the bitlines
are precharged to a mid-supply potential, and during readout, a sense amplifier compares the
selected bitline to a like-precharged dummy bitline to determine if the accessed cell’s
capacitor was charged or not. The control of stray capacitances and the precharge voltage
make such structures difficult to design and the absolute absence of 3D capacitors in a
standard CMOS process make the single transistor approach unrealistic in an ASIC.

The Differential DRAM


This differential DRAM cell is called DRAM_CELL2 and is a pair of cells, an upper cell and
a lower cell, that share common bitline contacts. The vertical M1 lines are the differential
bitline signals, and M2 crosses

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 156

the cell to carry (from top to bottom) VDD, wordline backing for the upper cell, GND,
wordline backing for the lower cell, and VDD again. This cell can be arrayed on a 4-μ wide by
8-μ high pitch. Each bit occupies 16 μ2 of space. Recalling our SRAM cell was 6 μ by 11.8 μ,
with an area of 70.8 μ2, we can fit about 4.4 times as many bits of DRAM into the same space
as SRAM. There are, however, limitations on how large the array may be vertically, and the
I/O circuit for the DRAM will be more complicated. For most designs, taking overhead into
account, the DRAM solution is about one-fourth to one-third the size of SRAM.
DRAM is slower than SRAM, and is only used in signal processing applications that require
large amounts of memory that is continually being accessed, as the cells of DRAM must be
refreshed continuously. For an ASIC variety of differential DRAM, the maximum time
between cell accesses should be no longer than a few milliseconds. Because all cells along an
activated wordline are refreshed simultaneously, only one cell in an entire row needs to be
accessed for the entire row to be refreshed.
The dual cell schematic is shown in Figure 7.10 and the layout is shown as Figure C7.12 in
the color section.
NMOS devices with their gates connected to VDD are effectively capacitors, the remaining
devices are access transistors. Depending on your layout tool’s ability to recognize
MOSFETs with a common source/drain terminal, you may need to schematize the cell as
capacitors

Figure 7.10 DRAM dual-cell schematic.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 157

in place of the NMOS devices, and create a recognition layer to draw onto those regions,
perhaps named MOS_CAP. Using the Tanner tools, I have needed to convert my schematic to
reflect capacitors instead of MOS caps.
SPICE simulations, however, need to be performed with real MOSFETS, at least initially,
so that the simulation of a single cell is more accurate.
The DRAM_TIE cell (Figure C7.13) is bulky, and does not allow VDD to be carried
vertically. The VDD line is attached to the cell caps, however, and the currents through any
one strip is rather small. Good contact of the horizontal M2 VDD lines at each end of the array
is adequate. Although the DRAM_TIE cell is wide, usually the core that it abuts is very wide,
perhaps 32 to 64 cell columns, and the resulting overhead is small. This cell layout is shown in
the color section as Figure C7.13.
The poly gate of the MOS cap must be at VDD potential, to create an inverted region below
the gate, which conducts to the access transistor. The size of the MOS cap in the layout is 1.2
μ by 1.5 μ, which has a gate capacitance of about 4.5 fF, and the N diffusion between the
MOS cap and the access device contribute another 2.5 fF, leaving a total cell storage
capacitance of about 7 fF. The difference between the voltage on a charged cell and a
discharged cell is about 3 V, so the charge coupled to a bitline during read from a discharged
cell is about 21 fC.
In operation, the bitlines are precharged to 5 V before a read cycle. When the differential
cell is accessed by a wordline going high, one side of the cell will have a low stored potential
that will cause the attached bitline to fall, charging that storage capacitance. The other side of
the cell will have its capacitor already charged, and will not affect its bitline potential. The
difference in potential between the two bitlines will indicate the previously recorded state of
the cell. The differential signal that must be sensed is fairly small, on the order of a few
hundred millivolts.
The differential read signal will be amplified in the I/O section by very simple sense amps
that have a statistically distributed offset voltage, and the read signal must always be greater
than this offset, or incorrect data could be read from the cell. The magnitude of the read signal
will depend on the capacitance along the bitline; for this cell, a bitline capacitance of 100 fF
will result in a signal that is approximately 210 mV, which is large for a DRAM, and easily
sensed. Large sense signals allow the sense amplifiers to operate quickly in high speed
applications; smaller sense signals can be tolerated if the sense process is slower and gentler.
The bitline loading will be from the M1 parasitic capacitance to other layers, as well as the
diffusion contacts that are common to every cell pair. The capacitance of this diffusion, a 1.2 μ
by 1.4 μ feature, depends on the applied voltage as reverse potentials cause a widening of the
depletion region. When the bitlines are precharged to full supply, the

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 158

capacitive loading of the bitline diffusions is minimal, approximately 1.5 fF per contact. The
capacitance of the diffusion in the storage cell, however, is greatest when its potential is low;
this is a perfect situation for a discharged cell to affect the bitline capacitance.
Diffusion capacitance is calculated by the SPICE parameters CJ, CJSW, MJ, and MJSW,
along with the potentials PB and PHP. These parameters allow the calculation of capacitance
versus voltage. The parameters CJ, PB, and MJ pertain to the area of the junction, while
CJSW, PHP, and MJSW pertain to the sidewall. The diffusion gradient is steeper on the
sidewall because of the sharp edge masking by poly and FOX during implant, while the
bottom of the diffusion is more gradual. The total capacitance is the sum of these two
independent values. CJ and CJSW are the capacitance values at zero bias, the MJ and MJSW
parameters indicate how the capacitance varies with applied potential. Consult your SPICE
manual if you need the equations. Figure 7.11 illustrates the relationship between diode
capacitance and reverse voltage.

Figure 7.11 Variation in junction capacitance with applied reverse voltage.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 159

The M1 lines have a capacitance to other layers that is, of course, not dependent on bias
potential, and for this process the bitline capacitance due to metal is about 1.3 fF per cell pair.
The total bitline capacitance within each cell is then about 2.8 fF. A column of 32 of these cell
pairs (64 wordlines) would show a bitline capacitance of about 90 fF. There will be additional
capacitances in the I/O section, but we will find that they can be made reasonably small.
A portion of the I/O section can be drawn to illustrate the sense amp configuration and the
control signals that are required, shown in Figure 7.12.
The layout of a section of this circuitry is shown in the color section as Figure C7.14.
Although not immediately obvious, there is a synergy going on with this scheme that needs
to be revealed. M2 and m3 precharge the bitlines to VDD when PCHN is low. MO and MON
are the mux outputs which will be connected to all of the other mux outputs in the bit block as
differential signals, and only one MUX signal will be active at a time. The mux output, MO
and MON are also precharged by the PCHN control, but I have not indicated this in the
schematic. As the result of MO, MON, BL, and BLN all being at VDD potential after
precharge, the mux devices m7 and m8 will be off even if MUX is high and only their
source/drain diffusion capacitance will load the bitlines. M5 and m6 are

Figure 7.12 Precharge, sense, and MUX devices at bottom of each DRAM bitline pair.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 160

cross-coupled NMOS devices that constitute the sense amp. After PCHN goes high, turning
off the precharge devices, and a wordline in the cell array has gone active, a slight difference
in potential will exist between BL and BLN. At this point, the control signal SENN will begin
to fall (driven by the corner circuitry) toward GND, and the difference in potential will be
amplified by the cross-coupled NMOS pair m5 and m6.
At the beginning of sense, let’s say BL is at 5 V, but BLN is at 4.9 V. When SENN falls to
about 3.3 V, m6 will begin to conduct, since its gate potential is 100 mV higher than that at
m5. M6 will pull down yet further on BLN, quickly prohibiting m5 from ever turning on. By
the time the lowest potential bitline begins to turn on the cross-coupled PMOS devices m1 and
m4, the differential signal has been amplified from the 100 mV starting pint to at least 500
mV. Therefore, although the primary sense devices m5 and m6 need to have good offset
characteristics, m1 and m4 do not. All of the drawn PMOS devices can be of minimum
dimension, L = 0.6 μ and W = 1.2 μ. Therefore, the gates of the cross-coupled PMOS devices
will have only a slight loading effect on the bitlines, less than 2 fF. The NMOS sense amps,
however, must be made larger so that they will have a small offset voltage. The upshot is that
the loading capacitance in the I/O block is largely the result of the NMOS sense amp pair, the
size of which controls offset of the sense amp, and since the signal from the array must
overcome this offset, loading the bitlines with extra sense amp capacitance can reduce the
charge required to read a cell properly.
A layout portion of the precharge/sense amp portion for the differential DRAM is shown. It
is complicated, using traditional but twisted layout forms. Notice the bitlines are carried
through the amplifiers with poly, which will produce voltage-drop errors if the currents are
large through these paths. Analysis will show however, that at the onset of sense, the currents
are small to the extent that these voltage drops are millivolts at most, provided SENN is not
slammed on (going low) too quickly.

Sense amp statistical offsets


The statistical variation in MOS devices, which is the variation in threshold voltage from one
transistor to another in a layout, depends on device gate area. A general rule of thumb is that
two devices each with a 1 square micron gate area will have a 1 sigma Vt variation of about 10
mV. Further, the relationship is square law, meaning that two NMOS transistors, each with a 4
μ2 gate area will have a 1 sigma Vt variation of 5 mV. A pair of 1 μ ×100 μ devices would
have a 1 sigma variation of 1 mV. With this information, we can think through how bad the
offsets can get, despite the rather optimistic first impression of 1 sigma values.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 161

The offset of a differential pair of transistors can be positive or negative, but the mean will
be zero offset. A normal statistical distribution of amplifier offsets can be drawn as a bell
curve, with zero down the center (at the curve’s peak), and offsets become increasingly
improbable as we go along the curve to the right (positive offset), or left (negative offset) from
the center. The 1 sigma points indicate that 68% of the fabricated differential pairs will fall
within these bounds. The devices that will fall within the 2 sigma points (twice as large an
offset) include 95.5% of all fabricated devices, and so on.
Differential pairs, 1 μ2 gate area per device:

Sigma Offset limit % Good % Bad


1 10mV 68.26 31.74
2 20mV 95.45 4.55
3 30mV 99.73 0.27
4 40mV 99.9937 0.0063
5 50mV 99.99994 0.00006

The table shows that a pair of 1 μ2 devices will all be within the 50 mV offset limits, with
the exception of perhaps 1 in 1.7 million. If the differential pairs are 4 μ2 each, we would see
this level of accuracy within 25 mV limits. Consider, however, that you may have thousands
of such sense amps in a design, and only one needs to fail from an offset problem to make the
entire chip useless. You might imagine that the above chart can also be used to great
advantage when designing analog circuits.
There certainly are other noise mechanisms on-chip, which may require a greater margin;
the charge within the cell may decay with time due to leakage, providing a much smaller than
ideal output sense signal, or substrate currents may cause differences between two devices,
simply because they are not in the exact same place. Practical DRAM designs will have a 1
sigma offset of 3 to 4 mV and a sense signal of at least 100 mV. Commercial DRAM works in
the 120 to 150 mV range, but it would appear that much smaller signal voltages could be
designed, considering the potentially low offset of sense amps. Experimentation will be
required to determine just how small the sense signal can be.
My experience is that the predictions of amplifier offset are optimistic for the illustrated
layout; perhaps due to slight asymmetry in the layout, perhaps due to incomplete analysis of
bitline loading, perhaps due to process variations that were not encountered when device
matching test layouts were evaluated. The sense signal should be on the order of 150 mV, with
a calculated worst case sense amp offset on the order of 40 mV. Despite offset calculations,
my experience is that cutting the noise margin closer can lead to disappointing results.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 162

Soft errors
When DRAM was first developed, the structures were very large, and each DRAM IC would
hold only 4K to 16K bits, with a cell size that was on the order 500 μ2/ bit. It was discovered
that dynamic memories suffered from soft errors; a bit would occasionally read back
incorrectly, but otherwise, once rewritten, seemed to work fine. The trouble was tracked back
to alpha particles generated by trace radioactive elements in the packaging material. Today it
is known that cosmic radiation and highly energetic protons and neutrons can upset memories,
despite the use of very pure packaging materials. The theory at the time, based on tests with
DRAMs exposed to 5 MeV alpha emitters (the presumed packaging material contamination),
was that each DRAM cell must store at least 50 fC of charge to resist upset by alpha particles
of this energy.
The differential DRAM cell illustrated here may in fact suffer rare occasions of alpha
particle upset, but more recent research indicates that extraterrestrial radiation has the
capability to upset DRAMs even if constructed with large charge storage capability. In fact,
even SRAM and logic circuits can be disrupted under the right radiation conditions. As a
radiation particle enters the silicon substrate after passing through many layers of metal and
dielectric, it penetrates deeply, and generates minority carriers that can travel considerable
distances before being absorbed by a junction as a current. As DRAM cells become smaller,
they offer a smaller area through which such carriers may be collected. Further, the differential
DRAM has both cells in close proximity, as opposed to the traditional single cell design where
the output signal from a cell is compared to a dummy cell that may be some distance away. If
you are concerned about rare but eventual soft errors, you may wish to increase the height of
the differential DRAM cell, increasing its storage capacitance accordingly.

DRAM Timing
The timing requirements of the DRAM are more complicated than the SRAM, but only
because of the extra sense amp signal SENN. Figure 7.13 illustrates this.

Figure 7.13 DRAM timing signals.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 163

The signals can be generated with delays and simple gates; on the falling edge of CLK,
PCHN is immediately pulled high, then after a short delay, WORD is brought high and a weak
driver begins to pull low on SENN. The rising wordline will deliver the cell’s charges to the
bitlines immediately, as the time constant of the access transistor resistance and the cell
capacitance is extremely short.
If SENN is brought low more gently, then the sense devices will behave based on their
threshold voltages; if brought low abruptly, other differences in the two sense devices, such as
source contacting resistance differences or poly routing resistances, can cause false sense
amplification. Further, the matching of devices goes beyond mere threshold voltage; the beta
of the devices is also statistically variable from device to device. Beta is the drain current for a
given device geometry, fitted to a (Vg–Vt) squared curve. The abrupt onset of SENN will
reveal these device differences, in addition to the threshold variations previously discussed.
On the rising edge of CLK, PCHN is brought low more slowly, so that the entire
capacitance of the array can be brought to VDD more gradually; otherwise, the peak current
required for precharge can cause ASIC-wide problems. DRAM is very dense, and many
applications will use DRAM because a large amount of memory is required. If slowing the
precharge of a large DRAM array is not considered, peak VDD currents can be in the ampere
range, persisting for only a few nanoseconds; this is difficult for lead inductances to supply.
SRAMs suffer the same problem, but the DRAM is much more dense and will require more
peak precharge power for a given memory layout size.

Leakage currents and storage time


Logic processes that you may wish to use in fabricating a DRAM will have transistor
threshold voltages that are adjusted to satisfy different needs. A purely logic process will
typically have lower threshold voltages, on the order of 0.65 V (0.35 μ), while a process for
mixed signal designs may have a higher threshold, on the order of 0.8 V. Lower threshold
devices conduct a bit better when on, driving higher output currents, but higher threshold
devices conduct less current when off.
Figure 7.14 shows a plot of drain current versus gate voltage, swept from 0 to 1 V, for an L
= 0.6, W = 1.2-μ NMOS device with a threshold voltage of 0.65 V. We can see how the
current increases with gate voltage at about 1 decade of drain current for every 90 mV of gate
voltage. This is the subthreshold region of MOSFET operation. Notice the residual drain
current when Vg = 0; the drain current is on the order of 1 pA.
When the bitlines of a differential DRAM are pulled to VDD, and the wordlines are all off,
the access transistors are also off. However, when

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 164

Figure 7.14 Plot of NMOS device leakage vs. gate voltage.

the bitlines are pulled to GND by the sense devices, every access transistor along the grounded
bitline will be leaking this small current from its attached storage capacitance. Therefore,
many accesses from a given wordline row will slowly degrade the charge from ALL other
cells in the array.
When you do research on DRAM for ASICs, you may run into articles where the data
retention time is measured to be quite reasonable, on the order of hundreds of milliseconds,
but are these tests done by actually running the DRAM, or just precharging the array, waiting
for a while, and then testing the data integrity? Point is, in practical application, the access of
cells will degrade the data integrity of all the other cells in the array, leading to much shorter
data retention times. This can be overcome by forcing a high refresh rate on the DRAM or the
mechanism, which, once understood, can be thwarted at its origin: If the sense amp signal
SENN is never allowed to go below a few hundred millivolts, the leakage currents can be
reduced by several orders of magnitude.
A circuit that clamps the SENN to a safe voltage could look like Figure 7.15. M1 and m2
are weak PMOS devices that pull light currents when the sense command signal, SEN_ONN
falls. A voltage is produced across m3, which acts as a bias source for m4. M2 will pull up on
the gate of the SENN driver m6, and SENN will begin to fall. When SENN falls to a low
enough voltage to allow m4 to begin to conduct, it will begin to pull down on the gate of m6,
and the process will settle at a SENN potential of a few hundred millivolts. The final SENN

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 165

Figure 7.15 Circuit to clamp the sense amp control signal, reducing leakage in the memory
array.

potential will depend on the dimensioning of m1, m2, m3, and m4. M5 always ensures that
SENN is brought to VDD during precharge. Do not attempt a differential DRAM without this
circuit feature.
Additionally, the write circuits in each I/O block that pull a mux output line to GND should
be connected to SENN instead; bitlines in a differential DRAM should never be pulled all the
way to ground, especially in a process that has low NMOS threshold voltages; unacceptable
data retention times will result.
Finally, although the sense amp dimensions and the cell sizes can be determined by
calculations that show statistical offsets across the sense amps of a memory array, there are
stray capacitances that must be considered, ones that are not usually revealed when extracting
your design. If we consider all sense amps to be operating independently, the design could
show unexpectedly poor yield; in fact, due to coupling capacitances between adjacent bit
pairs, the activity of one sense amp could affect a neighboring one.
We would normally consider the difference in threshold between devices within a sense
amp pair, but consider the possibility of one pair having a high voltage threshold, with a
neighboring sense amp pair having a low threshold voltage. The pair with the low threshold
will begin sensing, amplifying the difference between its bitlines, prior to the high threshold
voltage neighbor. Since the voltage gain is significant, and coupling capacitance between
adjacent lines can be large, the low threshold bitline pair can couple to the high threshold
bitline pair and cause an upset that exceeds the sense signal, causing a fault. The best remedy
for this is to make bitlines of a pair close together, minimizing the stray capacitance to
neighboring bitline pairs.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 166

Figure 7.16 Differential element of a circular memory array.

Other Memories
If you’re interested in the sequential-latch circular memory, a simplified schematic that
illustrates how it works is shown in Figure 7.16. I will leave to you the pleasure of working
out a tight layout. It is a dynamic structure with a short data retention time, but in high speed
interpolators and decimators, such structures refresh the cells continuously, and refresh time is
less of an issue.
Because the devices are all NMOS, the rules applying to the N well structure does not get in
the way of a tight layout. M2 and m3 are storage capacitors, and can be made quite small, on
the order of a few square microns. The cells are intended to be abutted vertically so that
OUTP, INN, INP, and OUTN run vertically through the array, and the horizontal line WWR
becomes the WRD line of a cell placed above this one. In this fashion, when the WWR line is
activated in one cell, the WRD line is simultaneously activated in the cell that lies just above
this one.
The cell is differential, and when accessed, one output will be pulled low and the other left
high, depending on the cell’s written state. The I/O circuitry is shown in Figure 7.17.
M5 and m6 form a cross-coupled PMOS pair that ensures a solid logic level to the data
output buffer m7 and m8. M5 and m6 are designed to be weak devices, so that the currents
pulled by the devices in the array can overcome any static condition that may exist from the
previous access. If this detail is considered, no precharging is required. The input circuit is
simply a pair of inverters to deliver a differential signal for writing to the adjacent memory
cell.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 167

Figure 7.17 I/O section of differential, circular memory.

The wordlines should be nonoverlapping, but will also always be sequentially enabled, so
the wordline driver can be a counter and a series of gates; the wordlines are forced to ground
by a rising clock signal that then clocks the counter, perhaps through a short delay. When the
clock is low, the selected wordline is activated. A wordline driver such as this only requires a
single clock line to repeatedly step through the accessing of all of the memory elements, which
is particularly convenient. The cell size is about that of an SRAM cell, but the simplified I/O
and the automatic addressing scheme make this a particularly attractive block in a complicated
signal processing function.
Now that we’ve understood some differential techniques, you may consider fully static
ROMs that operate differentially; although two programmed cells would be required for each
bit, the resulting structure could operate at extremely high speeds in a truly static mode,
without clocking.

Experimental Nonvolatile Structures


You may be making a sensor that needs calibration parameters, or a device that needs to
include a security code. These applications require programmability, just once during
manufacture, and few bits are required—you don’t need a huge FLASH array, just a few
programmable bits.
UV erasable EPROMs use a floating gate technology, where two levels of polysilicon, one
stacked onto the other, insulated from each other by a thin oxide layer, are used as the gate of
an NMOS transistor. The lower poly is the floating gate, and its potential will determine the
transistor’s conductivity. The upper poly layer is the wordline that spans the array, and when
brought high, will capacitively couple a positive potential to the floating gate, turning the
memory cell’s transistor on. If the

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 168

floating gate is sufficiently negatively charged relative to the wordline poly, then the transistor
can never turn on.
In a UV EPROM, the charge between the two poly layers is brought to zero by exposure to
UV light, which has a high enough photonic energy to create electron hole pairs in the
insulator that separates the poly layers. Collection of these charges depletes the charge
between the poly layers. In the absence of such high energy photons (at least 3.1 eV, or a
wavelength less than 400 nm), the silicon dioxide insulation is nearly perfect; data can be
stored as a charge of 10,000 to 20,000 electrons on the lower poly to keep that cell from
conducting, and this condition can persist for decades without the charge dissipating. This is
the programmed state of the cell.
Programming is accomplished in an EPROM by the hot carrier effect. At the drain junction
of an N MOSFET, when stressed with both high gate and drain potentials, the electrons
flowing from source to drain encounter a high electric field at the drain junction’s depleted
region, and produce electron-hole pairs due to impact ionization. The NMOS devices, which
are more prone to this effect than the PMOS devices, will have a lightly doped drain (LDD)
feature to minimize the effect; the LDD will lower substrate currents that can be detrimental to
circuit function, while improving device reliability. Without the LDD feature, device
characteristics can degrade over time at high operating voltages and temperatures.
The LDD structure is cleverly developed, as shown in Figure 7.18.
After the gate poly layer is deposited and patterned, a shallow, low density implant is
performed, which can function as source/drain regions, but does not conduct current well
enough for good connection to source/drain contacts. An oxide spacer layer is then deposited
isotropically, uniformly covering the entire wafer. Next, an anisotropic etch is performed,
which removes the oxide in a direction that is perpendicular

Figure 7.18 Lightly doped drain process steps.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 169

to the wafer’s surface only; the remaining spacer material on each side of the gate poly is
then used as a mask during a more dense implant. This allows the doping at the source and
drain contacts to conduct currents very well, but the doping in the regions immediately
adjacent to the gate retain the LDD implant dose only. This provides a somewhat graded
junction into which the electrons will flow under high current stress (it’s like electrons
hitting a pillow instead of a brick wall), less impact ionization occurs and fewer electron/hole
pairs are generated in the drain region.
You may encounter an LDD mask layer in the process you choose; it is usually a copy of
the N implant mask, but allows the selective removal of the LDD feature in certain regions of
your design. It is also a pain in designs where you want the LDD feature everywhere; you
must draw it on every device or have it generated prior to tapeout, significantly increasing the
database size.
The LDD feature does not eliminate impact ionization; it just increases the severity of the
required conditions for it to occur. During impact ionization, the gate potential is positive and
large drain currents flow. Generated holes flow through the P substrate as currents directly,
while generated electrons constitute minority carriers injected into the substrate. Some of the
generated electrons will have sufficient kinetic energy to make their way through thin oxide
(TOX) and be collected by the positively charged gate. These electrons are called hot
electrons. If the gate is of the floating type, these electrons will accumulate on the gate,
lowering its potential. Through this mechanism, the floating gate of a UV erasable EPROM is
programmed.
Such structures are difficult to reliably fabricate in standard CMOS, since the conditions for
hot-electron programming are generally on the verge of avalanche breakdown; if you provide
a programming means that requires high voltages, the difference between a potential that does
the job and the potential that kills your chip may be very thin. Making use of the LDD option,
removing it where impact ionization is desired, may improve this difference, allowing
programmable memory.
Floating gates of the double poly version are not allowed in most commercial CMOS
processes, as poly2 is not allowed to overlap poly and active simultaneously. We can,
however, make floating gates, even in a single poly process.
In Figure 7.19, I’ve drawn the PMOS device to indicate its body (well connection) to be
also attached to source and drain; effectively it is a capacitor that couples to the gate of the
NMOS device. The gates (FG) are floating; that is, they are connected to a diffusion that could
leak away charge. This structure can be expected to hold a charge indefinitely, although it is
not as compact as the stacked, double poly version.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
SPECIALTY LOGIC STRUCTURES AND MEMORY Keith Barr 170

Figure 7.19 Floating gate structure in standard CMOS process.

The process you chose could have a “tunnel” oxide layer, which could be used to make
EEPROMs. The nature of very thin silicon dioxide layers is that conduction will be possible,
although at low current densities, at voltages that are lower than that which causes breakdown.
The tunnel mechanism is quite abrupt, occurring at approximately 10 million V/cm; tunneling
for 0.35-μ process gate oxides (7 nm) begins at about 8 V. I f your process offers a very thin
oxide layer, on the order of 3 to 4 nm, reasonable voltages can be used to program and erase
small memory arrays.
The snapback phenomenon illustrated in the chapter on protection devices could be used to
advantage, if robust, long gate devices are used as surrounding drivers. A small NMOS device
could be intentionally damaged by excessive currents, causing a short in the delicate drain
diffusion region.
The abutment of N and P implant on active can cause a low voltage zener diode that also
could be used in “damage” mode, and at potentials that are relatively normal for the
surrounding circuitry.
Embarking on a new memory concept that would require special features could require
significant work, but if valuable results are expected, you may find your fab to be helpful—
they may be able to benefit from the information returned from your tests. After all, a fab has a
limited engineering budget, but may be very interested in your experimentation. When
troubling the fab’s engineers for data on special structures, you may get a friendlier response
if you indicate that you are willing to share the results of your tests.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
Source : ASIC Design in the Silicon Sandbox Keith Barr 171

Logic, Binary Mathematics, and Processing


High-level programming languages used so frequently today can isolate the programmer
from the gritty details of binary mathematics, yet a full understanding of the nature of
binary math is essential for those who wish to design their own processing structures,
especially those who wish to make circuits that contain novel concepts. Therefore, I’ve
included this chapter to help bridge the gap between the issues of standard cells and
memories that have only single bit meanings, and analog circuit design, in Chapter 9,
which deals with a continuum of quantifiable values. When designing analog circuits, you
will often merge them with binary computing machines, the design of which will require
a set of binary math techniques.
By no means is this section exhaustive; I strongly suggest the well-read Theory and
Application of Digital Signal Processing, written by Rabiner and Gold in 1975.
Technology has certainly changed in the 30 years since its publishing; the underlying
principles, however, have not. It is an excellent text, and includes the original FORTRAN
program written by Parks and McClellan for approximating equiripple FIR filter
coefficients via the Remez exchange algorithm. The text assumes you already have a
good grounding in binary mathematics, which, even today, few engineers possess.

A Logic Primer
The basic gate function is the NAND or NOR, from which all other functions can be
developed. Any simple logic function is inverting, due to the characteristics of the devices
used to create the function in hardware. NAND or NOR gates can be made into inverters.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 172

The NOR function can be formed from the NAND function.

Multiple input gates can be formed from simple 2-input gates.

The XOR gate accepts two inputs and produces a true output if the inputs are in
opposite logic states.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 173

The XOR gate will act as an inverter when one input is high and as a noninverter when one
input is low. It can be used to selectively complement a bit.
The term true means “high” (VDD) in our case, although we could define logic as
negative-true, where true means low (GND). False is the opposite of true in either convention.
The mux function:

Input A is selected when S is low, input B is selected when S is high.


The half adder uses the derived XOR function.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 174

As can the full adder:

Of course, these functions are not built from simple NAND gates in practice, but it is both
interesting and useful to see how they can be. CMOS standard cells will use more condensed
arrangements of NMOS and PMOS devices to accomplish the functions in a more compact
space, with shorter propagation delays.
All of the above are “static” functions, while the flip-flops (registers) are “dynamic,”
responding to the changing clock signal.

The D flip-flop can be constructed from the simplest gate functions and set and reset
terminals can be added with extra inputs on the gates. Unlike its TTL predecessor, CMOS
processes use the transmission gate as a switch that can be controlled to conduct selectively.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 175

The transmission gate:

Signals will conduct bidirectionally between A and B when S is high and SN is low.
Opposite control signals will cause the transmission gate to be off, blocking conduction
between A and B.
The CMOS latch uses these transmission gates internally.

This latch is transparent low; that is, the D input is transferred to the Q output while C is
low; when C goes high, the last condition of Q is retained, and Q becomes independent of the
logic condition of D.
The CMOS D flip-flop is essentially two latches, with the clocking signals to the
transmission gates out of phase.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 176

The state of the D input is allowed into the first latch when C is low, while the second latch
is forced to hold its condition. When C goes high, the D path to the first latch is broken, and
the path between the two latches is enabled; the previous D value is therefore transferred to the
Q output.
The tristate buffer can use the transmission gate.

The signal at A will emerge at Y when E is high, but Y will be completely disconnected
when E is low, allowing another device to drive the node at Y. Inverters are used in the path so
that loading on the Y terminal does not affect the signal at the A input; unlike gates,
transmission gates have no gain, and when on, do not isolate output loads from the input.
Multiple signals can be selectively connected to a single destination by the use of the MUX
function, or, alternatively, by the use of the tristate buffer. The mux allows the selection of
input signals to a single destination, while the tristate buffer allows signals from various
origins to

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 177

be placed onto a common bus, where the signals may be picked up at numerous possible
destinations along the bus. If a tristate bus is used, care must be taken to ensure that a signal
from some source is always driving the bus, or a bus-hold function should be used to keep the
last bus state from changing due to leakage. Mid-supply logic signals within a CMOS IC will
cause excessive power dissipation. The bus hold is simply two weak inverters.

The inverters must be weak enough for the held logic state to be overcome by any device
driving the bus; its purpose is only to keep the bus at a full logic 0 or 1 when not being driven.
The set-reset flip-flop can be of two types, true input, using NOR gates, or false input, using
NAND gates.

For the NOR type, both inputs are normally low; a high pulse on an input will force an
output condition that will persist after the pulse has returned to GND. The NAND type
normally has its inputs high, and responds to negative pulses.
A delay can be provided by a series of inverters; the delay will be inverting if there are an
odd number of inverters, and will be noninverting if the number of inverters is even.
A pulse generator can be made from a gate and an inverting delay. The NAND2 function
and an inverting delay provide a negative pulse on rising input transitions.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 178

The pulse duration will be approximately that of the total delay through the inverter chain.
A positive pulse can be provided in response to the falling edge of an input signal using an
inverting delay and a NOR gate.

A positive pulse can be generated on both rising and falling edges by the use of a
noninverting delay and an XOR gate.

Beware when using pulses to do useful work; if the delay is insufficient, the resulting pulse
may be too short to propagate through a system. Delay elements with reasonable delay values
can be made as cells that have fairly well-defined delay periods.

Shift Registers
Shift registers are simply a sequence of flip-flops provided with a common clock.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 179

Q0 will contain the most recent input data, Q3 the oldest. For the shift register to be
operated reliably, there should be no skew in the timing of the clock signals—the clock signal
should be supplied by a single source. The propagation delay of a flip-flop is very short, so the
output of one circuit will transition almost immediately after the clock rising edge. The shift
register is often used to accept a serial bitstream and convert it into a parallel data word.
A shift register can be synchronously loaded for serializing a parallel word into a serial
output bitstream.

The data on D0-D3 will transfer to the flip-flops on the rising edge of C when the signal
LOAD is high. The first bit to emerge at Q will be D0, and subsequent bits while load = 0 will
arrive at Q on each rising edge of C, followed by zeros since the A input of the MUX at the
extreme left is grounded.

Clock Synchronization
A master clocking scheme is used to coordinate the functions on an IC, but due to delays as
the clock signal propagates through buffers, a signal may arrive at a distant block a little
earlier or later than the local clock,

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 180

which can cause ambiguity in interpreting the data correctly. In this case, we can delay the
signal by one clock cycle but obtain the data unambiguously through back-side clocking of the
data.

We expect the data to change just after the clock rising edge, but timing skew across the
chip may make the local clock late by a tiny margin; this circuit clocks the first registers on the
falling edge of the clock, and the second set on the rising edge. A one cycle delay is imposed
on the signal, but potential ambiguity is eliminated.

Counters
A flip-flop with D driven by QN is called a toggle flip-flop.

The output will change on each rising clock edge, effectively dividing the input frequency
by 2.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 181

When we build counters, it is convenient (but not necessary) for them to output a binary
sequence.

The output changes on the rising edge of CLK. The zero state of the counter is when all
output bits are low. Notice that in the binary sequence, a bit value changes coincidently with
the falling edge of the next lower-ordered bit.
Counters can be organized in two basic ways, the ripple counter and the synchronous
counter.

The ripple counter is a sequence of toggle flip-flops. This example shows the outputs taken
from the Q outputs, which will result in a proper binary sequence.
The ripple counter has one disadvantage; the output bits do not change at one moment in
time after a rising clock edge, but instead the changes ripple through the output bits, from the
lowest order bit (Q0) to the highest order bit (Q3). If we want outputs that change
simultaneously, we must either register the ripple counter output or build the more
complicated synchronous counter.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 182

Each bit of the synchronous counter accepts information from lower order bit outputs, and
toggles on the next clock only when all lower order bits are outputting high; this indicates that
on the next clock they will all go low, and it is time for the bit in question to toggle. This
requires that lower order bit conditions be propagated to the higher order bits through the
NAND2s and inverters at the bottom of the drawing. This propagation time sets a limit on the
maximum clock rate that a synchronous counter can accept.
Ripple counters can accept very high clock rates, but their outputs do not settle
simultaneously; synchronous counter outputs settle simultaneously, but are limited in
maximum clock rate. The longer the synchronous counter, the lower the allowable maximum
clock frequency.
The synchronous counter can be reset asynchronously by employing reset flip-flops, or
synchronously by adding gate functions at the D terminal of each register to force zeros into
the registers on the next rising clock edge.
As an alternative to the synchronous counter, consider the clocked-output ripple counter.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 183

A state machine is a clocked sequential circuit that transitions from one state to the next,
depending on control information. The classic state machine is a multibit register with logic
circuitry connecting the output to the input, with external bits that can redirect the output
sequence. Most state machines for signal processing are controlled by a program counter. In
classic state machine terms, a program counter is an adder coupling the register’s output
back to its input, with an added value of one.
In a continuous signal processing architecture, an N-bit program counter counts to its
maximum count, and automatically rolls over to state 0 on a continuing basis, every 2^N clock
periods. Counter outputs are decoded by gates and flip-flops into control signals that are
required to enable, redirect, address, and register signals within the signal processor’s
hardware. The outputs of a program counter can connect to gates that provide these control
signals, beginning and ending at specific moments within the program counter’s cycle.
When deriving signals with gates, very brief false signals can occur at the gate outputs, due
to slight timing skew between the counter outputs. Signals should only be generated directly
by gates when the transitions into those gates have well-established timing; backside clocking
may be required to obtain signals that are clean and stable, or gate outputs can be cleaned up
by flip-flops that are clocked by the program counter clock. The glicthing problem is
illustrated next.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 184

Binary Numbering Systems


A binary counter of N bits will count from zero to (2^N) − 1, whereupon the next state will be
zero again. The number of states, including zero, is 2^N. This is the range of positive integers
for an N-bit binary word. Negative numbers may be specified with an added sign bit,
providing a sign magnitude numbering format. All calculations can then be performed with
mathematics that only work for positive numbers and the sign of the result can be considered
separately. In sign magnitude, there are two possible zero values.

Binary code Signed Binary meaning


0000 0
0001 1
0010 2
0011 3
0100 4
0101 5
0110 6
0111 7
1000 0
1001 −1
1010 −2
1011 −3
1100 −4
1101 −5
1110 −6
1111 −7

The most useful numbering scheme in binary mathematics is the 2’s complement
numbering system. This system uses the most significant bit (MSB) of the binary word to
indicate sign.

Binary code 2’s complement meaning


0000 0
0001 1
0010 2
0011 3
0100 4
0101 5

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 185

0110 6
0111 7
1000 −8
1001 −7
1010 −6
1011 −5
1100 −4
1101 −3
1110 −2
1111 −1

In this format, there is only a single zero value, which is positive, but the maximum positive
value is 7 (in a 4-bit word), the maximum negative value is −8. If we rearrange the codes so
that their values are continuous:

0111 7
0110 6
0101 5
0100 4
0011 3
0010 2
0001 1
0000 0
1111 −1
1110 −2
1101 −3
1100 −4
1011 −5
1010 −6
1001 −7
1000 −8

We see that all positive numbers have a sign bit of 0, and that negative numbers have a sign
bit of 1. Further, that adding 1 to any number in the series gives us the next higher number,
with the exception of the maximum positive value, which wraps around to the maximum
negative value.
A few simple rules govern the use of 2’s comp numbers. A 1’s complement operation is
performed by inverting all of the bits in the word. A 2’s complement operation is performed
by inverting all bits and adding 1 to the result; this is effectively the negation operation, which
simply changes the sign of the number. To convert a signed binary number into 2’s comp,
we do nothing if the sign bit is zero, but if it is a one, we 2’s complement the remaining bits.
Once in 2’s comp format, all additions and subtractions can be performed with normal
binary mathematics.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 186

0+0=0
0+1=1 no carry
1+0=1 no carry
1+1=0 with a carry to the next bit.
1+1+carry=1 with carry to next bit.

An example of the 2’s complementing of −1: Starting with 1111 = −1, we invert the bits to
obtain 0000, and add 1 to obtain 0001 = +1.
The 2’s comp scheme is very useful during addition and subtraction operations.
Multiplication is in effect the selective addition of shifted copies of an input value, and
likewise benefits from the ease with which 2’s comp numbers are added.
Subtraction, for example A-B, is performed by the addition of A to the 2’s complemented
(sign changed) version of B. In practice, this means applying the A value to one input of an
adder, and the inverted version of B to the other while setting the carry input of the LSB adder
to a logic 1; this effectively performs the 2’s complement of B automatically. If the result is
within the 2’s comp system’s range, it will be correct for all combinations of the signs of A
and B. In the 2’s comp numbering system, the sign is a natural part of the number and is
carried through all simple operations correctly; any carry produced at the MSB of the addition
is ignored.

Saturation Limiting
If an operation results in a value that is out of range, the result will be terribly incorrect; in our
4-bit example, 5 + 5 would deliver a −6 result. Since signal processing machines are dealing
with signals that have a defined range, such a result would be better limited to that range,
instead of wrapping around the 2’s comp numbering system—limiting to the extremes is
more correct than accepting a wildly incorrect result. To perform such saturation limiting
during addition, we first need to sign extend the 2’s comp values by adding an additional bit
to the left of the numbers that are copies of the sign bits. In effect, we have two sign bits from
each value going into the adder, and two at the output. Any result that is correct will have both
sign bits equal; an out-of-bounds result will leave the two sign bits unequal. We can use this to
force the correct maximum code in place of the erroneous result. The saturation limited result
is then stripped of its sign bit.

00101 (+5)
+00101 (+5)
01010 (out of range)
0111 (saturation limited result)

In this example, the correct sign is expressed by the extended sign bit, indicating that the
result was indeed a positive value, but the inequality of the two sign bits indicates the result is
out of range. The saturation
Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).
Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 187

limiting circuitry will force a 0111 result. The negative saturation limit would be 1000.
Additions in 2’s comp are performed with an additional MSB bit to accept the sign
extension. The circuitry for performing addition with saturation limited results is illustrated in
Figure 8.1.
A3 and B3 are carried into an extra adder bit as sign extension. The XOR gate finds
inequality between the sign bits. The NAND2 gates in the signal path force the proper
saturation limited condition. Notice that the MSB NAND gates are driven from control signals
that are opposite to those of the lower bits, which allows us to produce correct saturation
limited results.
We can also see why the adder schematic is drawn with the carry-in on the right and the
carry-out on the left. We can draw schematics with the numerical system in mind, where
traditionally the least significant bits are on the right, and the carry signal propagates toward
the left, as we would when manually adding decimal numbers. Typically, both input signals
will be applied to the adder at once, whereupon the sum and carry outputs of each bit will
attempt to settle. A given bit within the adder cannot settle until its carry is stable; this means
that all bits to the right have settled into a final state. The speed of an adder depends on the
number of bits through which the carry must propagate, and the propagation delay of the
carry-out circuitry within the adder cell.

Figure 8.1 Saturation limiter example.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 188

Look-Ahead Carry Generation


Wide adders can be made faster by the use of look-ahead carry circuits, attached to the adder
in groups that predict the carry output of a group and deliver this result to the next group,
before the carry can fully propagate through the group. The use of look-ahead complicates an
otherwise simple structure. A 4-bit look ahead carry generator is shown in Figure 8.2.
The look-ahead block is wired to the inputs of a 4-bit adder block, and the carry out is used
as a carry signal to the next 4-bit block. Each set of inputs is evaluated through a NAND gate
to determine if a carry is generated at that bit location (1 + 1 = 0 and a carry out), and a NOR
gate determines if a carry could be propagated at that bit location. (1 + 0 + carry = 0 and a
carry out). These bits are gathered by the remaining gates. The structure can be designed to
cover more than 4-bits, by following the basic structure shown.
Look ahead can also improve the maximum clock rate of the synchronous counter; the
required circuitry is obviously much simpler.

Figure 8.2 Look-ahead carry example.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 189

Multipliers
Multiplication is the selected summation of a successively shifted input multiplicand word, the
selection of additions at each shift location depending on the state of corresponding bits in the
multiplier word. The final summation of partial products is the output of the multiplier. To
perform such operations on 2’s complement numbers, we must use the arithmetic shift,
where right-shifting causes the sign bit to be copied into the new left most bit position.
A multiplier can be implemented as a shift-and-selectively-add machine that uses very few
gates, but the structure is slow in operation. More commonly, an array multiplier is used that
accepts one input along the top, passing it vertically down through the array, with the other
input along the left side, carried by horizontal lines across the array. The most significant
outputs will appear along the bottom edge, and the least significant outputs appear along the
right edge. For most signal processing applications, only the MSB product values are used.
Each row of adders is sign extended one bit to the left, and accepts the vertically passed
inputs through AND gates that allow that row to sum those input values or zeros instead. More
conveniently, the inputs are inverted at the top and side, and NOR functions are used local to
the adders, since NOR gates are smaller and faster. Each gated adder row passes its output to
the next row down the array, shifted to the right by one bit position, sign extending into the
remaining left-most column. The top row of adders is controlled by the inverted LSB of the
left input, and subsequent gated adders down the array are controlled by the successively
higher-ordered (and inverted) left input bits. At the bottom, the MSB of the inverted left input
gates the addition of the complement of the upper input signal, and passes a noninverted
version of that control signal to the carry input of the bottom adder row, effectively subtracting
the full value of the upper input in the case of a negative left-side input value.
The propagation delay of the multiplier, as described, requires adder carry propagations
across the top of the array, adder sum delays diagonally from the upper left corner to the
bottom right corner, and a final carry propagation delay along the bottom edge to the leftmost
corner. Propagating the carry signal vertically down the array, instead of across a row can
reduce multiplier settling time—in this case, it is a race between sum propagation and carry
propagation to the left end of the bottom adder row.
The block multiplier can be made from customized adder and NOR function cells that abut
into a rectangular shape, with some special cells around the array for preparing input signals
and the special treatment of the MSB at the bottom. The propagation time is reasonable for

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 190

small multipliers in sub-50 MHz systems designed in 0.35 μ, but higher performance will
require considerable deviation from the block approach, none of which arrange as nicely in a
layout.
The Booth’s multiplier is a clever arrangement that requires recoding of the left input bits
to control adder rows that have the capacity to add, subtract, and shift an added/subtracted
value, reducing the number of adder rows. The basic principle is something like this: 1 + 2 + 4
= 7, but 8 − 1 = 7 too. Using fewer adder stages makes the booth’s multiplier a little faster,
but not necessarily smaller, and certainly not simpler.
When designing a block multiplier, you may wish to consider the pitch of the cells you
build, so that the multiplier abuts to circuitry at the top or bottom, which is often an SRAM,
and circuitry at the side, which is often a ROM. This would be typical in an FIR filter
application.
When multiplying 2’s complement numbers, careful attention must be paid to rounding; in
the above-described multiplier, all numbers round “down,” but it is often desirable for them
to round toward zero instead. This may be accomplished by gating the carry into the final
adder, ensuring that the carry is not set under the conditions of both input numbers being
negative. DSP systems that do not round toward zero may output persistent negative values at
the LSB level, despite the inputs being zero. Be sure to check behavior under all conditions, as
persistent signals into recursive structures can give rise to limit cycles that degrade
performance.

Digital Filtering
This is not the place to fully explain digital filtering, but I would like to take a quick swipe at
basic concepts and some filtering structures to introduce the hardware implications. Rabiner
and Gold is a good place to start learning about sampled systems and filters, and the works are
quite complete. Here, I’ll try to show how binary math can be used to process signals, and
some structures that lead to processor design.
Signals are sampled at a sampling rate Fs, and the resulting sample sequence will represent
information frequencies from DC to Fs/2. Every sampled frequency F will have an alias
frequency equal to Fs-F. Therefore, all frequencies above Fs/2 must be removed prior to
sampling, or they will appear in their aliased form below Fs/2 along with the desired baseband
information, causing incorrect sampled values.
Sampled signals are quantized into binary values for signal processing. The accuracy with
which a signal is quantized is reflected in the width of the binary number that represents each
sample; the magnitude of the least significant bit will determine the minimum magnitude of
quantization noise that will accompany the signal once in the digital domain. Digital signal
processing allows the manipulation of signals

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 191

with known mathematical precision and does not vary with component tolerances, as is the
case with analog processing.

FIR Filters
Quantized signals can be filtered for any number of purposes by the two classes of filters:
finite impulse response filters (FIR) and infinite impulse response filters (IIR). The FIR filter
performs the convolution of an input signal with a set of stored coefficients. Often the required
frequency response is low pass; coefficients for which can be obtained by sampling the sinx/x
(sinc) function, which unfortunately extends infinitely in both directions from a central peak.
The sinc function is generated by evaluating sin(x)/x, where x is in radians from minus infinity
to plus infinity, and in the unique case of x = 0, infinity is resolved by forcing the result to 1.
The series can be truncated to a finite series of values by gently attenuating the values toward
the ends-—called windowing—but the best solution is the Parks McClellan Remez exchange
program. This program will calculate a set of coefficients based on desired filter
characteristics, with better control over parameters than the windowing method. A typical plot
of the sampled values of a coefficient set for a low-pass FIR filter.

Notice the FIR filter coefficient sequence is symmetrical; therefore, only one-half of the
values need be stored in a ROM, the remaining half being generated by the stored half, using
complemented addresses to the ROM. This symmetrical coefficient set produces outputs that
are of linear phase, meaning that all output frequencies are delayed by the same amount. The
use of unsymmetrical coefficient sets will allow filter delay to vary with frequency; and such
sets need to be fully stored in ROM.
In a typical low-pass filter, the input samples must also be stored so that every input sample
can be multiplied by a corresponding coefficient; the filter output sample being the sum of all
multiplication results.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 192

The next input sample then shifts into the memory, the oldest sample is discarded, and the
process repeats.

Low-pass filter functions are widely used in decimators and interpolators, as their response
can be easily tailored to suit any requirement of frequency and phase response. To accomplish
this, there is a required delay through the FIR filter, which is unavoidable. The delay period in
the case of the above coefficient set is eight samples—essentially the delay to the peak in the
coefficient set. FIR filters benefit from the use of dedicated circuitry, although they can
obviously be performed by general purpose architectures.
As a side note, the frequency response of the FIR filter can be determined by the Fourier
transform of its coefficient set. The Fourier transform is a very simple math tool, and for those
that are intimidated by the term “e to the power of j omega t,” consider what goes on when
we analyze the above structure.
The input signal is passed through a delay (memory), and the output is the sum of each
memory location (delay time) times the corresponding coefficient. We are interested in the
frequency response of the filter, so we pass signals of different frequencies through the filter
and see what comes out.
A sine wave can be imagined as a point traveling around a circle, once per cycle. Sine
waves are generated in a computer using radians as a unit of angle measurement, and we know
that there are 2π radians in a circle. If we imagine our sample frequency, Fs, as having a
period of 2π radians, then all signal frequencies will be a fraction of this sample frequency:
one-half the sample frequency and below. We know that each stage of the memory output is
delayed by a sample period from the previous location, so we can calculate the phase shift that
a sine or cosine wave of a given frequency will have at each point along the delay. In order to
do the analysis properly, we need to evaluate the phase shift

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 193

for both sine and cosine waves simultaneously. We keep the two separate through all of our
multiplications with the coefficient set, and add them to obtain a total sine component and a
total cosine component for each frequency analysis. From this we can calculate the amplitude
as the square root of the sum of the squared components, and the phase angle as the arctangent
of the ratio of sine total to cosine total. This analysis can be easily coded in any high-level
language with perhaps a dozen lines of code. I discovered this as a kid while attempting to
determine the response of a tapped delay, only later to discover that educated people call it a
Fourier transform.
You can’t evaluate such systems with a single sine wave alone, for a sine wave is a signal
that is continually varying—since sine and cosine are phase shifted from each other by one-
quarter of the way around the circle, the sum of their squares equals 1 at all times. Using both
sine and cosine waves in our evaluation of digital systems allows us to analyze amplitude and
phase response correctly. Sounds complex, eh?
This brings me to a second point I’d like to drive home with as much emphasis as
possible. Earlier, I made the statement that you don’t need to know how something works,
only how it behaves. I’m not proposing anarchy in academia with this second notion, but,
consider: you don’t have to know how someone else understands something, provided you
understand it yourself, and in whatever terms you like. Basically, if it works for you, use it, but
remember that if you need someone to teach you how it all goes, you must first learn their
“language” and learning the language is usually more difficult than understanding what is
to be taught.
I’m doin’ my best here to use the simplest language possible, and hope the effort is
appreciated.

Interpolators and decimators


The interpolation process produces samples at a higher output rate, some multiple of the input
sample rate, and requires a low-pass function in the filter to ensure that output samples only
represent baseband information. In this case, the operations proceed at the higher output
sample rate, but fortunately, not all multiplications of the coefficient set are required. For
example, if the output sample rate is four times the input rate, only every fourth coefficient
value is valid; there are no intervening input samples (from an output sample rate perspective)
that correspond to the remaining coefficients, so, during their convolution, these products are
zero. This simplifies the interpolator, despite the fact that it is essentially running at a high
rate.
Decimators are intended to reduce the sample rate, and use the low-pass function to remove
any aliasing that would occur by simply taking, say, every fourth input sample as an output.
This filter can run at the lower output sample rate, allowing more time for convolution
completion.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 194

IIR Filters
IIR filters are computationally less regular than the FIR, certainly more difficult to design, and
suffer phase distortions that may not be acceptable. For sharp and deep rejection of unwanted
signals, the IIR may be the better choice, but the overall simplicity and ease of coefficient
generation make the FIR friendlier. IIR filters are designed by first designing an analog filter,
from which a transfer function is derived; this is then warped and transformed into a set of
delays and coefficients that can be used in the digital domain. The process is tedious, but
straightforward. Single-ordered (single delay element) IIR filters can be easily understood
intuitively, but higher order filters cannot.
The simplest IIR low-pass filter, analogous to an RC filter, requires a single word-wide
register. The filter is calculated by two multiplies and an addition.
Note: We assume that in all of these algorithm flow diagrams that registers are regularly
clocked at the sample frequency.

This filter can be best understood if you start with the register zeroed and apply a step input
signal; the output will gradually approximate to the input value, step by step—a stepwise
analogy to an RC filter, provided K1 + K2 = 1.
K1 sets the −3 dB frequency of the filter, and K2 sets the gain. For unity gain, K2 = (1 −
K1). For a given −3 dB frequency F and a sample period t:

Since F = 1/(2π × TC), where TC is a time constant, we can imagine the time constant of
the filter by observing the magnitude of the first step (set by the input coefficient, K2), and
extrapolating the rate linearly out to a magnitude of the filter’s gain (1/K2), noticing the time
period required in clock cycles. This would approximate the time constant of the filer, and
from this we can predict the 3 dB rolloff point of the filter.
As a signal process, the operations could be as follows: Read input to multiplier along with
K2, load an accumulator with the product, read the register into the multiplier along with K1,
add the product to the

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 195

accumulator, and write the accumulator output back to the register. This implies the system
can be programmed to perform these tasks, and that the multiplier has an accumulator at its
output. An accumulator is simply an adder feeding a register, which holds the accumulated
results and feeds its output back to the adder—this is essentially an integration function. Such
an arrangement is common in signal processors, as the multiplier rarely stands alone, but is
augmented with a resettable accumulator—the resulting combination is called a multiplier-
accumulator (MAC). Usually, a signal processor is organized around the MAC, with only a
few control signals that clock the accumulator and temporarily break the register feedback to
the adder to force a load, as opposed to an accumulate operation.
Since two multiplies require two cycles to complete in a single multiplier system, an
alternative approach to the single stage IIR filter can deliver unity gain with just one multiply.

Note that the first addition is to the 2’s complemented value of the register output, which
is a subtraction operation. We can use this structure only because K2 = 1 − K1. K1 is implicit
in the above algorithm. The signal from the register back to its input is REG + (−REG × K2),
which is the same as REG × K1, provided K2 = 1 − K1.
High-pass filters can be made by subtracting the output of a low-pass filter from its input
signal; you can use this to remove DC offsets from signals. This brings us back to the same
single multiply low-pass filter, with the output taken from a different place.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 196

Figure 8.3 Higher order IIR filters.

The circuit can be seen as an integrator that will accumulate any DC offset at the output,
which it then subtracts from the input signal.
All of this implies that a signal processing architecture may be augmented with additional
features, such as adders prior to and following the multiplier that can be fed signals from
various locations through muxes or tristate busses. Also, certain paths may include XOR gates
so that the signal may be one’s complemented; the control signal can also be fed directly to
an adder’s carry input to correctly perform the 2’s complement negation. Simple functions
like this one can then be performed using a single multiplication cycle.
Higher order IIR filters are illustrated in Figure 8.3.
This basic structure uses coefficients that are difficult to derive, but more importantly, for
low-frequency filtering, it often requires coefficients that cover a wide range of values. If you
experiment with this filter structure, you will find few problems, provided you do your math in
a high-level language with floating point mathematics. The decision to put a floating point
processor in your IC design, however, should not be made lightly. Effort put into reducing the
required coefficient and signal word widths early on will pay back well later in a smaller,
simpler system.
The IIR filter above can be used with simple coefficients to do useful work, but only if the
filtering frequency is at the upper end of your signal spectrum, near Fs/2, where crude
coefficients work best. Consider this structure.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 197

The feedback coefficient would be negative for this intended function. The signal out of the
2-stage delay will be exactly out of phase from its input at a signal frequency of Fs/4, as it is
delayed by a 2/fs time period. If the feedback coefficient is negative, the feedback will
reinforce this frequency to the exclusion of others. If K is −1.0, the filter will have an infinite
Q and gain at precisely Fs/4. Realistic Qs and gains can be obtained by providing two parallel
multiplies: the first would be a −1.0, simply the addition of the second register’s
complement; the second could be a small shifted value. As an example, a 10-bit shift gives a
total feedback value of 1 − (1/1024) = 0.9990234, with a gain of 1024. A multiplier value of
0.9990234 would do as well.
This algorithm can be used to accept an Fs/4 input that is buried in noise, and produce a
clean signal due to its narrow bandpass function. With the hope that this inspires ideas: A
signal buried in noise can be quantized to a single bit, as the noise of single bit quantization
will most likely not degrade the already poor signal to ratio. Yes, weak and noisy signals can
be quantized with a simple comparator, fed through a sharp bandpass filter; the output of
which can be evaluated in terms of amplitude and phase. Although this is effectively an
averaging process, which takes time to settle, SNR improvements are only limited by the time
allowed to make the measurement and the width of the registers used. An SNR improvement
of 60 dB is not unreasonable.
In cases where the peaking frequency must be swept arbitrarily, the following structure can
be used, and although precise coefficients can be developed through the tedious
transformation process, the coefficients for this filter can be adjusted manually by following a
few rules to get the response you need.

K2 is negative, and sets the peaking frequency. As K2 approaches −4, the center frequency
approaches Fs/2. At lower frequencies, the relationship between K2 and frequency is not
linear; to cut the peaking frequency by an octave requires K2 to be reduced by a factor of 4.
K1 is

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 198

Figure 8.4 Biquad or state-variable filter.

positive, and controls the Q and gain, which will approach infinity at a K1 value of
approximately 1.0, and very low Q when K1 = 0. The two coefficients interact only to a
limited extent. With this understanding in mind, the filter can be “tuned” experimentally.
The previous filter produces excellent peaking at lower frequencies, but the K2 value will
become very small for very low frequency operation. To allow the use of more reasonable
multiplier values, we can borrow the biquad or state-variable filter from the analog world,
directly, as shown in Figure 8.4.
The filter consists of two integrators and a Q setting feedback path through Rq that
produces high-pass, band-pass, and low-pass outputs. In the analog world, Rf values can be
controlled to sweep the filter over a wide frequency range. In the digital world, the filter
becomes useless at frequencies above Fs/2, but low frequency filter functions have reasonable
coefficient values.

This filter can be swept over a wide range, with the peaking frequency approximately equal
to (Fs × Kf )/(2π). The Q and gain will remain stable while the Kf value is swept.
If your project includes filtering, I strongly suggest taking advantage of any “tricks” that
can simplify the hardware, especially in required word widths and number of operations that
would affect processor speed.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 199

Processing Machinery
The simplest signal processors are counter-driven state machines that repetitively execute a
predefined program. The program may be hard-wired by deriving memory addresses and
control signals from the program counter directly or through gates, or may be programmable,
where instructions are stored in a program memory called the control store. The control store
may be SRAM so that instructions can be written once, perhaps through a microcontroller
interface, whereupon the instructions are continually executed in their entirety during every
program cycle. A program cycle is often the sample period in simple signal processors.
Programs that perform such continuous processes are called algorithms. A simple architecture
is shown in Figure 8.5.
The instruction word will contain control information that sets up the hardware for that
instruction’s execution, along with addresses and data that may be used in that instruction’s
calculations. The control portion of the instruction word may be only a few bits; an instruction
decoder circuit will then deliver control information to the processor. A simple, yet very
effective processor may have only three bits of control information, allowing eight possible
hardware configurations for instruction execution. Control bits may instead couple directly to
the processor, one bit controlling whether the accumulation is to be performed, one for loading
the accumulator, and perhaps a few to directly control signal steering logic through muxes or
the enabling of tristate drivers. I/O ports can be mapped as an extension of the memory
address range.

Figure 8.5 A simple processor architecture.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 200

Signal processors that are available as standard parts often contain many features that allow
them to be used in the widest variety of applications, and as such can become extremely
complicated to fully understand, initially set up, and use. Alternatively, a signal processor that
only does the job you need done could be constructed quite simply. A purpose-built signal
processor can easily outperform a standard part, at a fraction of the standard part’s clock
rate.
Referring back to the second version of the simple IIR filter, note that the coefficient K2
could be a simple arithmetic shift, instead of a multiply. An arithmetic shift is a right shifting
of data while copying the sign bit into the MSB. This is the correct way to diminish the value
of a 2’s complement number by a factor of two at each shifting position. With this in mind,
we can see how a filter can be constructed without a multiplier. The limitation is that only
certain cutoff frequencies are available. Examples of the cutoff frequencies for such shifted
value filters would be.

Shifts F−3dB (Fs=1.0)


1 .1103
2 .0458
3 .0213
4 .0103
5 .0051
6 .0025
7 .00125
8 .000623

If you simply need to obtain an average of a signal, this can be accomplished quite simply
without a multiplier. Many averaging operations can be performed with crude cutoff
specifications. The rule of thumb for such low cutoff frequencies is that F-3 dB is
approximately (Fs × K2)/ (2π). The register width should be equal to the number of input bits
plus the number of bit shifts employed.

The Binary Point


The numbering systems we’ve discussed so far are integers, and you will find that a binary
point will need to be defined in both your data and coefficient paths, dividing these numbers
into an integer field and a fractional field. You will need the multiplier coefficient to cover the
range required, both in terms of maximum and minimum values, which will determine overall
coefficient width. The data path may require a certain amount of headroom to contain high
level interim signals without saturation limiting, while allowing sufficient resolution in the
fractional portion of the data word. Every situation is different; you may

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 201

wish to deal with signals with no headroom (all values are fractional or integers), as this
definition is arbitrary, but the multiplier coefficient binary point is essential for fractional
gains greater than unity. Your multiplier will be carefully structured to accommodate the
coefficient binary point through data shifting.
The MAC/processor block may contain extra features that make your process more specific
to the application, reducing the required hardware and the need for high-speed logic circuits.
Consider the sample processor wiring as shown in Figure 8.6.
This is not anything special, just a structure to talk about. The control section will supply
control signals and the multiplier coefficient from the control word, along with memory
addresses and memory R/W control. The input will be from either memory or device inputs;
the output will be picked up by the same memory or output devices. REGA is the accumulator,
but REGB will represent the accumulator’s current value during the next cycle, which may
be handy in efficiently calculating

Figure 8.6 A simple processor showing data paths.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 202

some algorithms. MUXA selects what is added to the multiplier output to become the next
accumulator value, which may be all zeros for an accumulator load operation. REGC holds a
previously read value that may be needed again later in the algorithm. MUXB selects signals
to be added to the input, if any. The XOR gates into the first adder allows the 2’s
complementing of MUXB’s output by selectively inverting the signal while setting the carry
input to the top adder, providing a true 2’s complement subtraction. The added circuitry for
such flexibility is rather small compared to the multiplier, but greatly increases the ability of
the processor to execute algorithms more efficiently. Other features can be added as needed.
Finally, the processor needs to be clocked, and connected elements such as memory and I/O
circuits need to be coordinated in terms of timing events. Establishing a master clock for your
entire chip, which runs at perhaps 2 × or 4 × the processor execution rate, allows additional
timing edges that can help in decoding instructions, registering memory address values,
controlling memory read/write lines, and clocking input and output signals so that system-
wide timing is not ambiguous. The power requirement for a high-frequency oscillator that
clocks a master counter is actually quite small if the signal is contained on-chip. Do not be
afraid of a high-frequency internal clock that allows many decoding states per process cycle;
timing conditions can then be more accurately defined and potential race conditions can be
eliminated.

Simple Auxiliary Processing Circuits


Here, I’ll try to detail a range of binary circuits that may be helpful in organizing a system.
I’ll simply skip through them adding comments and details as required. Many of the circuits
I’ll illustrate are suggestions intended to both introduce new concepts and stimulate your
imagination.

Floating point
Floating point mathematics is rarely required in a specifically designed process where bit
widths can be anticipated to achieve the required accuracy. In cases where bit width narrowing
is necessary, and signal dynamic range is more important than absolute accuracy, a floating
point system may be designed. It is helpful to realize that you can make the system anything
that suits your needs; it doesn’t have to adhere to an IEEE standard, only to your
requirements. Simple circuits can identify the number of leading identical MSB values in an
expanded data value, and through the use of muxes to normalize (shift) the data (left

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 203

shift until the absolute data value is greater than one-half), forming a mantissa. The number of
shifts then represents an exponent value. A 4-bit exponent can expand your mantissa by 15
bits, a 5-bit exponent by 31. A 27-bit word can fit into a 16-bit space using a 4-bit exponent,
with a mantissa numerical accuracy of 12 bits.
The circuit of Figure 8.7 can be expanded as required, but shows a method of detecting
leading sign bits and produces a 3-bit output exponent value. The XOR gates detect sameness
between adjacent bits, the NOR gates cause output to stop at the first incidence of adjacent bit-
difference. The sum of identical leading bits is then calculated through the adders. The
exponent (0:2) value can then be sent to a shift circuit that compacts the word by shifting all
data values to the left by the exponent amount, shown in Figure 8.8.
The circuit can be extended to the right by the required number of bits. It outputs the
floating point result. The expansion back to full-width words for processing is done by the
same shift method using the exponent portion of the floating point word as a driving signal to
muxes, but, of course, you must remember to sign extend the result as it is shifted.

Figure 8.7 Floating point exponent calculation.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 204

Figure 8.8 Floating point mantissa generator.

LOG and EXP


Although multiplication is straightforward in binary mathematics, division is not. When
division is required, the use of LOG and EXP functions are useful: 1/X = EXP[−1 × LOG
(X)]. If you choose to use a system that is base 2, then LOG values can be obtained by using
the floating point apparatus previously described to obtain an exponent value and a mantissa.
The resulting mantissa will be nearly correct, but not exact as a LOG2 value; a conversion
ROM can be employed to correct the raw mantissa into a more exact value. The adjustment is
slight; the ROM can be addressed by the upper few bits of the raw mantissa, and the ROM
output can be added to the whole raw mantissa as a correction factor. Remember, only positive
numbers have a valid logarithm, which simplifies the process. Exponentiation is similar to the
floating point expansion operation, with a correction ROM that is very similar but opposite to
the ROM used for correcting the LOG mantissa.

Phase and frequency


The comparison of two logical signals for determining frequency and phase relationships are
useful in phase-locked loop (PLL) circuits, and come in several styles. The simplest is the
XOR gate.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 205

The average of the XOR output can be quantified to determine a phase relationship through
an ADC. If the reference signal frequency is stable, a PLL can lock to it and transitions from a
corresponding input of varying phase can be used to capture the output of a PLL driven
counter, quantizing the phase relationship directly.
Simple off-chip resonant circuits can act as proximity sensors that produce varying phase
outputs.

The resonant frequency of a simple coil and capacitor can be influenced by a conductive
object that effectively shunts the field of the coil, lowering its inductance, and affecting the
resonant frequency. When driven by a constant logic signal, the phase of the signal across the
resonant circuit will vary with the object’s proximity, at a rate that depends on the circuit’s
Q. This is a cheap and dirty way of sensing external motion without physical contact.
Quadrature signals can be developed with a pair of flip-flops.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 206

A sense electrode can be capacitively coupled to electrodes connected to QA and QB.


Movement of the sense electrode will provide a signal, the phase of which depends on the
electrode’s position. Providing a linear array of QA, QB, QAN, QBN, QA, and so on allows
a continuing phase output as the sense electrode moves along the sequence. Noncontact linear
and rotary encoders can be made in this way. The sense electrode must be carefully carried
into the IC where the signal can be amplified and turned into a clean binary signal.
If the stimulus signals are derived from a binary counter, the rising or falling edge of the
sense signal can capture the counter’s contents into an output register to obtain a precision
position value.

Phase detectors
The phase and frequency of two binary signals can be determined simultaneously by a simple
two flip-flop circuit.

This is an important circuit in PLL design. Unlike the XOR phase detector, when used in a
PLL loop, it can produce an output that will force frequency locking, which is necessary for
PLL phase locking to a constant frequency input signal.
If both outputs are low, then a positive transition on one input will cause its corresponding
output to go high. Once into this condition, when the other input goes high, both outputs will
be high momentarily, and then both flip-flops will be reset to zero through the AND gate. I
have appropriately labeled these outputs ALDSB (A leads B) and BLDSA (B leads A). If one
input is at a substantially higher frequency than the other, then its output will be high most
often. When the two

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 207

inputs are of the same frequency but differ only a bit in phase, then the input that leads in
phase will cause it’s output to go high for a period that is proportional to the amount of
phase difference. This circuit provides ideal control signals for locking a continuous-input
PLL.

Synchronization
Asynchronous signals can be cleanly conveyed into a system-clocked input with a pair of flip-
flops.

This is a synchronization circuit that ensures that an asynchronous input signal is fully
captured as a positive pulse for a full system CLK period. The first flip-flop is set on the rising
edge of ASYNC_SIG since its D input is tied to VDD; on the next system CLK rising edge,
the condition is captured by the second flip-flop, and as the second stage’s Q goes high, the
first stage is reset. Asynchronous inputs need this synchronization so that “wild” signals
entering the system become clean and well defined in system clocking terms.

Waveform Generation
As described earlier, an adder outputting to a register, with feedback from the register back to
the adder is an accumulator, or an integrator. Two integrators in a feedback loop form an
oscillator. Very accurate low- frequency sine and cosine waves can be generated by this
method. I call this a “snake chases tail” oscillator, and can be used as an example of how to
think through a processing sequence.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 208

This is essentially a variant of the biquad filter with an infinite Q. The coefficients for the
two multiplies should be identical but of opposite sign (note the − sign at the leftmost add),
and the structure must be started by loading one register with a zero starting value, and the
other with a full scale value. Provided that the rounding of the mathematics is not toward zero,
the oscillator will perpetually output two sine waves with a quadrature relationship. Multiplies
may be arithmetic shifts in the case of noncritical output frequencies. The algorithm is simple:
Read REG2 and load into accumulator; read REG1, multiply by a positive coefficient, and add
to accumulator; write result to REG2 while multiplying the same value by the negated
coefficient, read REG1 again and add to accumulator, then write the result to REG1.
A sawtooth waveform can be developed through a simple process, repetitively adding a
constant to a register value. The register value will build and roll over at its maximum value,
provided saturation limiting is not in the signal path. This suggests a processor that has the
capability to inhibit the saturation overflow limiter.

The frequency of the sawtooth waveform will scale directly with the constant value, but the
sharp transition during rollover gives rise to frequency components that do not fall completely
within the DC to Fs/2 range. The alias components generated can be minimized by following
the sawtooth generator with a low-pass filter, and over sampling the entire process. This
means that the process is repeated numerous times per system sample period; both the signal
generation and the filtering. If the filter cuts off at Fs/2 or below, aliasing components will be
minimized, but never eliminated.

Pseudorandom Noise
A shift register that is driven by XOR gates, with the XOR functions driven by the shift
register outputs at selected taps, constitutes a noise generator, called a linear feedback shift
register. The noise that is produced is a repeating period of (2^N) −1 states, but the noise has a
rectangular, as opposed to Gaussian, distribution. The noise generator register never enters the
all zeros state, and must be initialized with a nonzero starting state to function.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 209

The length of the register and the selection of taps determines the sequence, and only certain
tap positions produce a maximal length sequence of (2^N) − 1 states. Other sequences of less
than (2^N) − 1 states can be obtained by tap selection. In the example shown, changing the
first tap from the Q of the first register to the second register will deliver a sequence length of
217. The operation of such circuits is easily proven, experimentally, with a high-level
language program.

Serial Interfaces
ASICs often perform repetitive processes under the control of a system microcontroller.
Microcontrollers are easily programmed and are so inexpensive that including the
microcontroller function into the ASIC is uneconomical unless extreme production volumes
are expected. Further, many cheap microcontrollers today include FLASH programmed ROM;
easy programming of the FLASH microcontroller/ASIC combination can shorten system
development time. To communicate between the two parts, a communication scheme is
required. Further, if the data interface between the two parts is not required to be particularly
fast, the two-wire interface is ideal; it minimizes your ASIC pin count, which reduces costs at
the die, package, and PCB level, as well as minimizing production failures due to PCB shorts
and opens.
The simplest two-wire interface, shown in Figure 8.9, uses a clock signal generated by the
microcontroller, and a single data line that is driven at both ends by open-drain devices and a
pullup resistor, which can be incorporated into the ASIC data pad structure. Under normal
operation, only one possible transition on the data line should be expected for every clock
signal, clocking data into or out of the ASIC. Further, the meaning of the interface clock signal
can be arranged to be the rising edge (traditional) or more efficiently, the rising and the falling
edges of clock. The interface requires synchronization, which can be accomplished by the
microcontroller asserting two transitions on the data line with no intervening clock
transition—this scheme is most efficient in terms of interface speed.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 210

Figure 8.9 Two-wire interface.

After two successive data transitions on MDA, asserted by the micro-controller, SYNC will
go high, and then low again on the next transition of MCK, allowing the resetting of data
receive circuitry. The first data bit transferred by the microcontroller could define whether the
transfer is to be programming and setup information coming from the microcontroller, or data
and status readout from the ASIC to the micro-controller. All data messages must have a
defined format, at the conclusion of which DOUT is left in a high state and the MDA driver
off, and MDA pulled high through the pullup resistor so that resynchronization can be allowed
to occur.

Modulation Coding
When a clock signal is available, as in the two-wire interface, data transfer is clean and
unambiguous. In the case of transferring data over a medium, such as infrared, RF, or via a
single wire, the clocking and synchronization information must be buried within the data.
Numerous techniques have been developed for various purposes, but there’s always room for
a new method, which can be entirely up to you. Modulation coding schemes are fun to invent
and often come as the result of invention’s reported “mother,” necessity.
The most widely used generic coding scheme is the serial interface, where data is sent via a
single wire, at an anticipated data rate. When the receiver knows the period between expressed
bits, a start bit can indicate the beginning of a data packet, and any slight skew in
synchronization between the transmit and receive clocks can be resolved as the receiver
accepts data transitions. In a typical serial interface, such as RS-232, the transmitting and
receiving clocks can be off by a few percent in frequency and data can be reliably
communicated. In cases

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 211

where the signaling rate is less reliably known, more clever techniques need to be developed.
Note: Despite the wonderful circuits available in a CMOS IC, only through the bandgap
reference can an accurate voltage be produced; accurate currents and accurate frequencies
depend on component tolerances, which in most IC processes are very poor. Therefore,
without an external precision resonator (crystal), an RC oscillator may be accurate to no better
than ±30% from lot to lot. Each device will produce a stable frequency, but die from another
run may produce a very different frequency.
When the incoming data rate is unknown, the code must include features that allow clock
derivation from the signal format. Also, it is common for modulation codes to be DC free, so
that the signal can be conducted through a channel that cannot convey direct currents, as in
systems that are transformer coupled. DC free modulation codes do not rely on signal polarity,
but instead, the occurrence of transitions from 0 to 1 or 1 to 0, with no regard to absolute
polarity. FM coding requires a transition at the beginning and end of each bit frame, an extra
transition in the center of the bit frame in the case of an expressed data 1 value, and the
absence of such a transition in the case of an expressed 0.

The coding violation during the sync period allows the beginning of a message to be
determined. In this case, the sync pattern occupies three bit frame periods, with an equal time
high as low, to help preserve the DC free characteristic of the format. Alternatively, as an
example, the data may be coded into a format of a mandatory 1 followed by 8 data bits, where
any sequence of 9 zeros could express sync without such a coding violation.
During the reading of FM code, the absolute rate of the code may not be known, but the
rules of the code are: A counter can determine the minimum and maximum periods between
transitions and these counter values can be used to determine the period between transitions as
they are received. The code can then be reconstructed into a proper data stream.
FM coding is somewhat inefficient, as two transitions are required to express a single bit.
The channel may require a greater bandwidth to

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 212

support FM code than other coding schemes. Modified FM was developed to lower the
required bandwidth, and was used for years as the coding method for magnetic recording in
disk drives.

Here, the transitions have been removed at the bit frame edges, while additional transitions
are inserted between adjacent zeros. The shortest run length is one bit frame period, the
longest is 2-bit frame periods. A sync pattern could be inserted, perhaps a 5-bit frame pattern,
with a single transition at its center. MFM coding requires one-half of the bandwidth that FM
coding requires, but the timing accuracy is the same for both codes—MFM is a narrower
bandwidth code, but not a sloppier code in terms of timing.

PWM Output Circuits


The cheapest and dirtiest way to output a binary on-chip value to an analog value off-chip is
through the use of a pulse width modulator. The modulator’s output will alternate between
supply and ground, and can be filtered into an average value with a simple RC filter. The
classic PWM circuit produces a signal of constant frequency with a duty cycle that depends on
input binary value, by the loading of a counter that will determine the output high period,
while a second counter determines the repeat period.

If the output is to be used directly, say, in turning on and off a heater element, this may be a
fine output. If, however, the signal is to be averaged by an RC filter into a low noise analog
value, the filter must be of narrow bandwidth or high order to reject the fundamental switching
frequency,

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 213

especially toward the middle of the PWM range, where the PWM output is nearly a 50%
square wave.
An improved logic solution is basically a single-ordered delta sigma modulator (DSM) that
outputs a correct duty cycle, but not at a fixed frequency. For example, to represent a 50%
output, a high-frequency output would do as well as a low frequency, provided the duty cycle
is 50%, and filtering would be much easier. If an 8-bit PWM converter was to output 1 LSB,
the output would go high for one period for every 255 periods that the output would be low.
This is also the case with the DSM converter, but in this case, the energy of the single period
pulse is small, and is easily filtered by a simple RC.
The basic structure is that of a clocked register and an adder configured to produce an
output bit (MSB of register): subtract that bit from the input on the next clock cycle, arriving
at a new register value, and a new output bit condition. The circuit attempts to make the
average bit value output equal to the applied input data value. The adder and register are
effectively integrating the error between the applied input word and the sequential output bit
value. In practice, the circuit elements combine to a very simple circuit; the repetitive addition
of the input value, once the input is converted to an unsigned integer. A 4-bit example is given
for brevity.

In this case, the input is expected to be a 4-bit 2’s comp value, which is converted into an
unsigned binary equivalent value by inverting the MSB. While the 2’s comp value ranges
from −8 to +7, after the sign bit inversion, the value ranges 0 to 15. If your data value is
already in unsigned binary, the inverter should be removed. Repetitive additions

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 214

Figure 8.10 Delta sigma modulator output patterns.

of the input value to the accumulated value produce a carryout from the adder, which is the
delta-sigma value, to be used directly as an output. The average of this value will be an analog
potential that spans between ground and VDD, depending on binary input code. The
maximum output will be VDD × 15/16. If full output is required, a more significant bit must
be available to force the output to all 1’s.
The inability to reach the full VDD level after filtering becomes unimportant when wider
words are modulated. The period of the repeating sequence, however, will be equal to the
clock period times 2^N, where N is the number of bits used. One final note of caution when
attempting to obtain high accuracy through the use of high clock frequencies and wide input
words: The output signal rise and fall times at the pad circuit will affect accuracy. The rise and
fall times should be closely matched, and be a small portion of the clocking period.
The value of using the DSM over a pulse width modulator is evident when we look at the
output sequences, as shown in Figure 8.10.

Digital Hysteresis
Noisy single bit signals can be cleaned up with a Schmitt trigger, but outputs from ADCs may
need stable outputs that do not jitter between adjacent codes. For this, we need to implement
some means of hysteresis to obtain a stable output, and will sacrifice the LSB in doing so.
Hysteresis circuits are difficult to think through, so I’ve included this as a last example of
auxiliary processing circuits.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 215

The hysteresis circuit is clocked with an input word applied and the output, which is held in
a register. Circuitry will compare the input and the output and decide if the output needs to be
updated to the new input value. The circuitry will determine if the current output is sufficiently
different from the input to require updating, but will always tolerate a single LSB difference
between input and output. The hysteresis circuit uses the LSB of the input word to make these
determinations, and discards the LSB in the process; if the input is N-bits wide, the output will
be N-1 bits wide. A 4-bit example is shown in Figure 8.11.
The registered output value, with an assumed LSB = 0, is subtracted from the input; signals
come from output QN terms and the adder carry is set high, effectively adding the 2’s
complement of the output to the input. If the result of the subtraction is 0, 1, or 2, then our
current output is within the LSB tolerance. If any of the MSBs are set (which could mean a
negative result), or both LSBs are set, then our current output needs to be updated. The mux
control signal goes high to register the input to the output on the next rising clock edge.

Figure 8.11 A 4-bit example of digital hysteresis.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
LOGIC, BINARY MATHEMATICS, AND PROCESSING Keith Barr 216

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
Source : ASIC Design in the Silicon Sandbox Keith Barr 217

Analog Circuit Introduction and Amplifiers


Modern technology relies on the MOSFET as a low power switching device for the
construction of high-speed computing devices. Over the past 20 years, we have seen a
remarkable transformation of our everyday lives due to the personal computer, which
would not have been possible if designers were restricted to the previous TTL
technology. CMOS devices can be scaled to tiny dimensions, and power consumption is
low.
Previous analog designs relied on bipolar technology, but the MOSFET appears to be
yet more effective in these designs too. In fact, the MOSFET is superior from almost
every point of view—zero DC input current, very high cutoff frequency, and multiple
regions of operation. The only characteristics that bipolar transistors offer that exceed
those of MOSFETs are higher transconductance and better low-frequency noise
performance. These shortcomings can be acceptably dealt with in practical MOSFET-
based analog designs.
The field of MOS analog circuits is broad, including amplifiers and comparators,
oscillators, filters, voltage references, temperature sensors, and the range is yet further
expanded by the use of switched capacitor techniques. We need to better understand the
strengths and the weaknesses of the devices available in a CMOS process to put them to
good use.

The MOSFET Regions of Operation


The best place to start the subject is to better understand the MOSFET. Let’s start by
looking at the NMOS device; the PMOS device behaves identically, with the exception of
terminal voltages being opposite, the

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 218

Figure 9.1 The three areas of MOSFET operation.

threshold voltage a bit different, and the lower mobility of holes (PMOS carriers) as
opposed to electrons (NMOS carriers). Essentially, the PMOS device is less conductive
than the NMOS for a given device size, but all other characteristics are the same. CMOS
truly is complementary.
Recalling my rough description in Chapter 1, the MOSFET has basically three regions
of operation: subthreshold, saturation, and linear, as illustrated in Figure 9.1. The three
areas are defined by the gate and drain potentials, both referred to the source potential.
The MOSFET used in CMOS circuits is an enhancement device, with a threshold voltage
Vt. So that we can understand the three areas of operation, we derive a second voltage
from gate potential, which we will call Vdsat.

Vdsat = Vgs − Vt if Vgs > Vt


Vdsat = 0 if Vgs < Vt

In the graph, the threshold voltage is shown as approximately 0.75 V. At gate voltages
below the threshold, the device is operating in the subthreshold region, regardless of drain
potential. At gate voltages above Vt, the region of operation depends on drain potential;
when Vd is greater than Vdsat, the device is in the saturation region, when Vd is below
Vdsat, the device is operating in the linear region.
Threshold voltage is set by the process designer, by adjusting the doping levels of the
substrate, the thickness of the gate oxide, and the gate material. Vt will vary from lot to
lot, and between NMOS and PMOS.

The subthreshold region


The subthreshold region is characterized by the drain current/gate voltage relationship,
where Id changes exponentially with Vg, and is relatively independent of Vd. The drain
current will increase by a

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 219

factor of 10 for approximately 90 mV of gate voltage increase, called the subthreshold slope.
Figure 9.2 shows the drain current of an L = 1 μ, W = 10 μ NMOS device with the drain at
5 V, while the gate potential is swept between 0 and 1 V. The plot is logarithmic on the Y
axis, to show the wide range of drain current in this mode of operation. The leakage of the
device is on the order of 20 fA at Vg = 0 and the gate threshold voltage is reached at about 700
mV. The subthreshold slope appears quite straight, from about 20 fA to a few micro amps,
some eight orders of magnitude; this exponential characteristic can be used to produce
multipliers and log/exponent conversion circuits, in much the same way bipolar devices are
used.
The transconductance of a device is the incremental change in drain current that results
from a corresponding incremental change in gate voltage. In the subthreshold region we see
that the transconductance varies directly with drain current—a result of our definition of
subthreshold slope. One way of looking at the MOSFET in the subthreshold region is to
imagine the MOSFET as a perfect, abrupt switching device with a resistance in series with its
source, and further, that the resistance varies with conducted current. From our plot of this
device, we can roughly calculate an effective source resistance of about 40 MΩ at 1 nA, and
observe that this resistance decreases directly with source current. We can say, therefore, that
the effective source resistance of an

Figure 9.2 Plot of drain current vs. gate voltage in the subthreshold region. [Note the log
scale on the Y axis (drain current).]

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 220

NMOS device in the subthreshold region is 0.04 Ω-A. This effective source resistance is not
the actual physical resistance of the source contacting, which also may be important, it is the
resistance of the source terminal for small voltage fluctuations when the drain is biased high
and the gate is at a fixed potential.
If we recall the bipolar device, the thermal voltage is 26 mV at room temperature (KT/q).
The transconductance of a bipolar device is approximately gm = Ic/(KT/q), but for a MOSFET
operating in the sub-threshold region, the expression would be more accurately described as
gm = Id/([1.5 × KT/q). This equation works in practice, and is far simpler than the SPICE
equations that fully model the device.
The source resistance of a MOSFET in the subthreshold region is, therefore, approximately
(1.5 × K × T/q)/Id, and is independent of device dimensions; transconductance depends only
on Id in the subthreshold region. The effective source resistance is the inverse of the device
transconductance.
In the subthreshold region, drain current is relatively independent of drain potential. Since
the ratio of voltage to current defines a resistance, we can say that the drain resistance is high
in the subthreshold region.
The plot of Figure 9.3 is of two devices, both biased to Vgs = 0.6 V, operating in the
subthreshold region. The drain voltage is swept from 0 to 5 V along the X axis, and drain
current is plotted on the Y axis. The curve with the steeper slope is an L = 1 μ, W = 10-μ
device; the flatter curve is L = 10 μ, W = 100 μ.

Figure 9.3 Plot of drain current vs. drain voltage for two device geometries. (Note the linear
scales.)

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 221

The slope of the drain current/drain voltage plot results from the depletion region around the
drain junction growing with applied voltage, under the gate, effectively making the gate length
seem shorter at high drain potentials. This is called the lambda effect. If the device’s gate
length is increased, the device will conduct less current, but the effect of the growth of the
drain depletion region will have a much smaller influence on effective gate length. The two
devices conduct approximately the same current, as their aspect ratios are the same, but the
device with the longer gate shows a higher dynamic drain resistance.
This illustrates that short gate devices will suffer from decreased output (drain) resistance, a
limiting factor in making analog circuits from short gate devices. Basically, you may fab a
circuit in 0.35-μ technology, but end up drawing devices with 5-μ gate lengths in your analog
circuits. Short gate lengths are useful in high speed circuits, but not in precision or high gain
analog ones.

The saturation region


The transition between the subthreshold and the saturation regions is not abrupt. In fact, some
SPICE models deal with the two areas independently, and if the functions that define one area
do not meet exactly with the functions that define the other, your SPICE program will have a
hard time converging to a proper result. If a gate voltage sweep on a device is not smooth and
continuous in the gate threshold region, ask your foundry for a different model.
The classic expression for drain current in the saturation region states that the drain current
will vary as the square of Vgs-Vt. That is, as the gate potential above threshold is doubled, the
drain current will quadruple. For an NMOS device of L = 1 μ and W = 10 μ, with a drain
potential of 5 V, the following plot can be made (Figure 9.4).
It would appear that the square rule is not accurate at high drain currents, however, at longer
gate lengths the relationship approaches square. Practical analog circuits will generally operate
devices in the subthreshold mode, or in saturation mode with a gate potential that is fairly low,
near the threshold voltage, with relatively long gates where the square law applies better. The
exact calculations for the drain current are quite complicated, depending on the model, and I
will only abstract general behavior here—sufficient to gain an understanding of how to use the
devices in practical circuits. Trust SPICE and your models for the details.
Figure 9.5 is the plot of drain current versus drain voltage for an L = 1 μ, W = 10-μ NMOS
device with VG = 1 V, 1.5 V, and 2 V. The slope of the plot indicates the same drain voltage
induced channel length modulation effect (lambda effect) we saw earlier with devices in the
subthreshold

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 222

Figure 9.4 Plot of drain current vs. gate voltage in the saturation region. (Note the linear
scales.)

region. Notice the far left side of the graphs, where at lower drain voltages the device falls out
of saturation, into the linear mode of operation.
The transconductance of a MOSFET in the saturation region does not increase as
dramatically with drain current as it does in the subthreshold region; it increases
approximately with the square of drain

Figure 9.5 Drain current vs. drain voltage at three gate voltages.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 223

current, and therefore effective source resistance falls approximately with the square of drain
current. Unlike the MOSFET properties in the subthreshold region, for a given drain current in
saturation, the sizing of the device’s L and W parameters will affect the required bias
condition, and transconductance and effective source resistance will be affected.

The linear region


When the drain voltage is lower than Vgs-Vt, the MOSFET behaves like a voltage-controlled
resistor. A plot of an L = 1 μ, W = 10-μ device is shown in Figure 9.6.
This plot sweeps drain voltage across the X axis from 0 to 100 mV, and plots drain current
up the Y axis from 0 to 250 μA. The different plots are for Vg = 1 V, 2 V, 3 V, 4 V, and 5 V.
The straight lines indicate the device is behaving like a voltage-controlled resistor; at Vg = 5
V, the device appears to be a 440-Ω resistor. Notice the most abrupt change in resistance
occurs at low gate potentials, near threshold.
When used as a resistor, the MOSFET will enter the subthreshold region when Vg is
brought below the threshold voltage, so achieving very high resistance values (where the
device is expected to behave as a linear resistor) by using a low Vg is not recommended.
MOSFETs can be used as linear resistors by employing a high gate voltage, and high

Figure 9.6 Drain current vs. drain voltage in the resistive region.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 224

resistances can be achieved by simply making the device with a large gate length and a small
gate width. The resistance will be determined by the device’s aspect ratio (L/W), so for this
process, we can roughly say that when the source is grounded and the gate is at +5V, the
resistance of the channel appears to have a sheet resistivity of about 4.4 KΩ. Although the
rules governing transistor length and width are not as tight as poly resistor rules, this may be
the highest resistivity element in your toolbox. The resistance does depend on gate to source-
drain potential, so expect the resistance to be nonlinear with applied signals.

The Body Effect


The above plots were done with the source grounded, but in many applications the NMOS
source is at a higher potential than ground. If we assume the substrate to be grounded, the
source potential will affect the threshold voltage of the device.
Figure 9.7 is a plot of the threshold voltage of an NMOS device as the source is swept from
0 to 3 V with the substrate grounded. At a source potential of 0 V, the gate threshold is Vt0,
about 0.7 V, but at a source potential of 1 V, the threshold voltage has risen to about 0.96 V.
This is called the body effect.

Figure 9.7 MOSFET threshold variation due to body effect.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 225

Capacitance of MOS Structures


The diffusions that constitute the source and drain connections to the MOSFET have area and
perimeter capacitance values defined in the SPICE model, as well as parameters that
determine how the junction capacitance varies with applied voltage. When describing a circuit
for SPICE simulation, the declaration of AS (area of source in square meters), PS (perimeter
of source in meters), AD (area of drain in square meters), and PD (perimeter of drain in
meters) is required, as these parasitic capacitances can significantly affect circuit performance.
The extract definition file that your layout tool uses to construct a netlist from a layout should
instruct the extractor to include these parameters in each device declaration.
In general, for a given simple MOS transistor drawn using minimum geometries (minimum
gate length, minimum active surround of source and drain contacts), the parasitic capacitance
of the drain region of the device will be nearly equal to that of the gate.

Gate capacitance
The capacitance of the gate to the other three device terminals (source, drain, and substrate)
will depend on the bias voltage at the gate. When the gate potential is negative with respect to
the substrate, the capacitance is largely to the substrate alone, as the inversion layer that would
normally provide a path between source and drain does not exist. There is a slight overlap of
the gate and the source and drain regions, on the order of 0.05 μ, which will always couple
these terminals to the gate. The majority of the gate capacitance during high negative bias is to
the substrate, and can be calculated from TOX thickness (plus a silicon surface value) or the
area gate capacitance value supplied by the foundry. Typical gate capacitance values for
different processes:

Process Gate capacitance


1.5 μ 1.2 fF/μ2 sq micron
0.6 μ 2.5 fF/μ2
0.35 μ 4.9 fF/μ2
0.25 μ 7.2 fF/μ2
0.18 μ 10 fF/μ2
0.13 μ 15 fF/μ2

As the gate potential approaches 0 V, holes within the substrate are repelled deeper into the
substrate, depleting the surface of the silicon of current conductors—the capacitance will
slowly fall as the gate potential increases. This effect varies from process to process, but a
typical 0.6-μ processes will show a decrease in gate capacitance, beginning at

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 226

about −0.8 V. At a gate potential of 0 V, the capacitance is down to about one-half the
maximum, and continues to decrease slightly as gate potential increases. When the threshold
voltage is reached, the gate capacitance abruptly increases, as an inversion layer is induced
under the gate, which allows conductivity between source and drain.
When the MOSFET is in the linear region, the gate capacitance is only to the inverted layer
with the capacitance equally distributed between source and drain terminals. In the saturation
region, with a high drain potential, the majority of the gate capacitance is to the source
terminal, the remainder to the drain terminal. When a device is in saturation, the channel
induced below the gate “pinches off ” at the drain terminal. This can be illustrated as shown
here.

MOS capacitors can be used as bypass capacitors, using the capacitance of the gate to the
inverted channel with all other terminals grounded. In this service, with the gate well above
the threshold voltage, the MOSFET is operating in the linear (resistive) region. SPICE,
however, will distribute the capacitance value to the source and drain regions equally, which is
convenient and straightforward, but is also incorrect. The inverted layer that the gate
capacitance is coupled to also has resistance, which will not be shown during SPICE
simulation. SPICE will show a nearly lossless capacitance, which is far from the actual case.
When building bypass capacitors using MOSFETS, run SPICE to calculate the actual channel
resistance, and assume that the capacitance is distributed along the resistive channel. Large
MOSFETs (big squares of poly over active) will simulate as good capacitors, but in reality,
they will have large losses. Build these devices as short and wide devices to lower the
capacitor’s effective series resistance.

Temperature Effects
As temperature increases, at very low drain currents, the MOS threshold voltage decreases at
approximately −2 mV/°C, while the slope of Id/Vg in the subthreshold region decreases.
This is very much like the behavior of bipolar transistors, except we find a point where the
temperature coefficient of drain current goes to zero at a gate voltage of about 0.8 V;
unfortunately, we cannot use this for voltage reference purposes, as this voltage will change
with process parameters. In the saturation region, drain current for a given gate voltage
decreases with temperature.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 227

Figure 9.8 An L = 1 μ, W = 10 μ NMOS device showing drain junction leakage at high


temperatures.

The plot of Figure 9.8 is at three temperatures: 0°C, 50°C, and 100°C. The 0° curve
intersects Vg = 0 at a few femtoamps, where the 100° curve is limited by drain junction
leakage to the substrate. At high temperatures, the ideal subthreshold slope characteristics are
limited. Here, we see that at 100°, the ideal exponential relationship between Id and Vg is
limited to about five orders of magnitude.
Polysilicon resistors also have temperature coefficients, but they will depend on the specific
process used to fabricate the resistive layers. Generally, high sheet resistances will have
negative temperature coefficients on the order of 0.3%/°C, while lower value sheet
resistivities will be positive, on the order of 0.1%/°C. Temperature coefficients of resistance
can cause thermally induced nonlinearity due to changing signals dynamically affecting the
local temperature of a resistor. Once recognized this can be calculated and adverse effects can
be overcome by design.

Current Sources and Sinks


A current source would be one that pulls up on a voltage node, toward supply, while a current
sink would pull toward ground. I’ll refer to both as current sources, for convenience, as we
switch between NMOS and PMOS devices. As we can see from the previous graphs, any
device biased into the saturation or subthreshold region behaves as a current source, but a
perfect current source would be immune to variations in output voltage, displaying an infinite
output resistance. We can improve the output resistance by simply making the gate longer, but
with diminishing returns.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 228

An alternative approach is to cascade devices in series, each with its own gate bias
potential.

The purpose of m2 is to provide a stable drain voltage for m1. Since gate currents are zero,
the current through both devices is identical. If m1’s drain potential does not change as the
drain potential on m2 changes, the current will be independent of output voltage.
Figure 9.9 is a plot of drain current versus m2 drain voltage for two L = 1 μ, W = 10 μ
devices in series. VB1 is set to 800 mV, and the three curves are with VB2, set to 800 mV,
900 mV, and 1 V. At VB2 = 800 mV. It is though we have a single device with L = 2 μ and W
= 10 μ, and a correspondingly low drain current. As VB2 is increased, the drain of m1 is
brought to a potential that places both devices independently in the saturation region; the
output current doubles, and the output resistance is substantially increased.

Figure 9.9 Drain current vs. drain voltage for varying bias potentials.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 229

This is called a cascade circuit, and can be used in high gain amplifiers to increase device
output resistances that would otherwise limit amplifier gain. Designs using cascade techniques
require additional bias sources for the cascade devices. The cascade technique can be
expanded to three devices in series (triple cascade), with an added bias source for the third
device, and further improvement in output resistance.
If the source is grounded, and the gate and drain terminals are connected together, the
device must be in either subthreshold or saturation, depending on current, but it cannot, by our
definitions, be in the linear mode. A current sent to this gate/drain node will bias the device to
the point where the device current equally balances the applied current. A second device of
identical dimensions will then reflect the current to a second node, in which case the circuit is
called a current mirror.

An input current I1 will pull up on VG until m1 conducts the I1 value, whereupon


equilibrium is established. If m2 is identical to m1, it will conduct an output current I2, that is
identical to I1, provided the drain of m2 is at VG potential. As we can see from our ID/VD
plots, the lambda effect will make the actual I2 output current vary slightly with m2 drain
potential, depending on device length dimensions.
The current mirror can be improved substantially by the use of cascade techniques.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 230

In this circuit, VG1 and VG2 are derived from m1 and m3, so that bias voltages are
available to m2 and m4, and as a cascade structure, the output impedance will be very high.
This method of cascade bias generation is convenient but inefficient for many circuits, as the
potential at VG2 is greater than necessary to ensure high output impedance. The useful range
over which the output voltage may swing will be limited. In circuits that have a sufficient
supply voltage this may not be an issue, but for circuits that must run on limited supply
potentials, other cascade bias techniques must be used.

Current scaling
The current mirror only reflects the applied current correctly when both devices used in the
mirror are identical. The threshold voltage for the process is usually determined by the testing
of rather large devices, and devices with small gate dimensions, particularly in the width
direction, will display increased threshold voltages. The effect is opposite in the length
direction, and only slight; and reducing the threshold voltage makes the gate appear
(electrically) shorter. Currents can be scaled using different sized devices, but NEVER use
device sizing to ratio currents unless the device widths are substantially greater than the
minimum that the rules allow. A much better way to scale currents is to use arrays of identical
devices.

With identical gate dimensions, accurate current ratios can be established.


The effect of device width on threshold voltage can be imagined as the encroachment of
field oxide into the transistor channel. Field oxide (FOX) is grown around the active area
within which the transistors are fabricated. When the device width is small, FOX adds an
additional space between the gate and the silicon, effectively increasing the thin oxide
dimension at the transistor gate endcap areas.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 231

This is a cross-sectional side view sketch of a very narrow width device. The growth of the
thick field oxide layer (FOX) is difficult to control, and encroaches into the device area. This
is called the “bird’s beak” that exists at each end of a poly gate as it overlaps the edges of
active. The transistor’s gate oxide becomes effectively thicker due to this effect, and
contributes to an abrupt increase in effective threshold voltage in devices with reduced width.

“Minimum simple device width” means a simple transistor constructed by crossing a


rectangle of active that is contacted at each end by a single contact with minimum active
surround. This data is typical for a 0.6-μ process. The design rules will allow narrower
transistors, but as we see, their threshold voltages go up very quickly. Predictable current
sources will be constructed with identical devices, or devices with rather large gate
dimensions.
The SPICE models supplied by your foundry may include variations to address small width
devices separately from larger devices. As can be imagined, it is difficult for a single model to
cover all possible geometries.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 232

Bias Sources
MOSFETs can be good current sources, but we need to establish reference voltages to apply to
their gates. The simplest reference is an NMOS device that has gate tied to drain, with a
resistor to supply.

Provided the supply is stable and the characteristics of the NMOS devices do not change
appreciably from wafer to wafer, the bias current through the device will be determined by the
resistor value. VB can be used to bias identical devices to reflect approximately the same
current.
Alternatively, if the supply voltage is expected to cover a wide range, the following could
be useful.

M1 is long and narrow, so as to conduct a small current through m2. The gate of m3 is pulled
high, drawing current through R1, until m2 pulls m3’s gate down, establishing an
equilibrium condition. M4 establishes

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 233

a reference voltage for PMOS devices, and m5 mirrors this current to m6, which produces a
bias potential for NMOS devices. The currents that develop from the use of this reference
depend on the tolerances of R1, the threshold voltage of m2, and, to a small extent, the current
passed by m1. The effect of variations in m1 current is minimized by designing m2 to operate
in the subthreshold region.
A more precise bias source could be obtained by the use of a bandgap reference. The
bandgap output is approximately 1.25 V and is stable over temperature. If a bandgap reference
is available on-chip (and you can easily make this so), it would appear as seen below:

M1, m2, m3, and m4 constitute a simple amplifier that controls the gate of m5 to establish a
current through R2 by forcing the BG reference voltage across R2, balancing the amplifier
inputs. M6 establishes VRP as a reference for PMOS devices, and m7 mirrors this to m8 to
establish an NMOS reference. This circuit will operate over a moderate range of supply
voltages, and, provided the bandgap reference is accurate and stable, the reference will allow
temperature-stable currents that depend solely on the tolerances of R2.
Often a single bias generator is put on-chip, one that will service all of the analog circuitry
for the ASIC. In each circuit that requires a bias current setting, a single device within the
target circuit is driven by the global bias potential. Be careful when dimensioning this device,
as it must have a length and width that is identical to that of the central bias generator; in the
above case, m6 or m8. Further, dimension the current setting devices so that they are well into
the saturation region, where

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 234

slight differences in ground potential across the chip or some dynamic capacitive coupling to
the bias line will have a minimized effect on device output current. If biased into subthreshold,
a 90 mV disturbance to ground or the bias line will result in a factor of 10 in resulting bias
current! Use bias devices that are as deep into saturation as possible.
Alternatively, the bandgap reference potential can be globally distributed, where it may be
used as a voltage reference, and transistor bias potentials can be developed locally. The
decision will have to do with how compact R2 can be in the process you’ve chosen. R1 is
quite noncritical, and can be an NMOS device with gate tied to supply, but to obtain a 10 μA
bias current, R2 would have to be about 125 K Ω. The entire bandgap to bias voltage converter
may be a fraction of the size of R2, unless high valued resistors are available. If your circuits
are intended to operate at extremely low power, a single bias voltage source, with a single,
large R2 value may be required.

CMOS Amplifiers
The MOSFET is essentially a voltage-controlled current source; the gate voltage controls
drain current and the relationship is one of transconductance. A voltage amplifier is made by
passing the current output of a transconductance through a load resistance. Devices with the
unique combination of high transconductance and high output resistance are useful in circuits
for producing high voltage gain.
Once you have the “feel” of device characteristics, transconductance, output resistance,
and on-chip capacitances, amplifier design can become an intuitive art. To help you acquire
this skill, I will try to show various amplifiers along with their strengths and limitations.
The simplest amplifier would be a transistor and a load resistance.

If we subject this to SPICE, we can find its DC characteristics in the plot of Figure 9.10.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 235

Figure 9.10 Transfer function of resistor loaded amplifier.

We see the amplifier is inverting, it only operates when the input is biased around 1.2 V, has
a gain of about 8.5, and its gain drops abruptly when the output swings below about 0.5 V.
This last issue is because the transistor is falling out of saturation and into the linear mode,
where it behaves more like a voltage-controlled resistor than a voltage-controlled current
source. Operating the transistor in a mode where its transconductance is higher would give
greater gain, which means we could increase device width, lowering the required gate voltage,
and operating more toward the subthreshold region. Let’s try an L = 1μ by W = 200 μ
device, and plot the DC response in Figure 9.11.

Figure 9.11 Transfer function of resistor loaded MOSFET amplifier with wide drawn
geometry.
Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).
Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 236

We see the gain has gone up, and as indicated by the input voltage, we see that we are
operating nearer to (or within) the subthreshold region. We’ve also solved the problem of
gain dropping off at low output potentials, as when operating in subthreshold, there’s no
linear region to fall into; once the gate potential has reached threshold, the output is already
nearly to ground.
To obtain higher gain, we will need to increase transconductance further, but this means a
higher operating current; a higher operating current will require a correspondingly lower
valued load resistance, which would cancel any potential gain improvement. If we’re already
in the subthreshold region, we cannot improve transconductance by sizing the device
differently. Amplifiers with simple resistive loads have single-stage gain capability that is
limited by the supply voltage; to achieve a higher gain, we need a larger drain current (leading
to higher transconductance) without lowering the load resistance.
If we use a PMOS transistor as a load, and bias it also into the subthreshold region, like the
NMOS.

Here, we use the high drain resistance of the PMOS device to supply a relatively constant
current to the amplifier’s output. We bias the PMOS device with a potential between VB and
the VDD supply so that the current is similar to that in the resistive load situation. SPICE then
shows the result in Figure 9.12.
A substantial gain improvement results. Zooming into the plot so we may assess gain, is
shown in Figure 9.13.
The slope in the middle of the output range looks like a 40 mV input swing would produce
a 5 V output swing, or a gain of about 125. This is a very nice improvement over the resistive
load design. Recalling that drain resistance increases when gate length is

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 237

Figure 9.12 Transfer function of MOSFET amplifier with a MOSFET load.

increased, let’s try it again with 10-μ long gates, as plotted in Figures 9.14 and 9.15.
The bias voltage shifted a bit, but the voltage gain appears to be in the order of 400 or 52
dB, which isn’t bad for two transistors.
The use of a voltage-controlled current source acting against a fixed current source is the
central theme of amplifier design.

Figure 9.13 Zoomed into Figure 9.12.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 238

Figure 9.14 10-μ gate lengths, 2 transistor gain stage.

Differential amplifiers
The preceding amplifier has limited utility, simply on account of its fixed input voltage. There
are uses for such a simple amplifier, but biasing the NMOS device becomes problematic. A
more useful, general purpose amplifier will have differential inputs so that we are amplifying
the difference between two signals, as you would with an op-amp. This is the simplest
amplifier that delivers a full output range—a 7-transistor design.
VB is supplied from a bias generator, and causes m1 and m6 to conduct the desired
operation currents. The current through m1 is split between the differential input transistors
m2 and m3. If the inputs INN

Figure 9.15 Zoomed into Figure 9.14.


Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).
Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 239

and INP are at the same potential, the currents through m4 and m5 will be identical. M6 pulls
the output low with the same current that m1 pulls through the input pair. Since the current
through m4 and m5 are each half the current through m1, then, in a perfectly balanced
condition, m7 will need to be the size of m4 and m5, as if they were in parallel.
The SPICE simulation driving the inputs differentially about a 2.5 V bias point was written
this way.
.include c:\spice\designs\sandbox0p6.mod
vp 50 0 5v
vin inp inn dc
vb vb 0 0.7v
vmid mid 0 2.5v
rb1 inn mid 1k
rb2 inp mid 1k
*
m1 1 vb 0 0 nmos l=1u w=10u ad=14p pd=22u
m2 2 inn 1 0 nmos l=1u w=50u ad=34p pd=54u as=34p ps=54u
m3 3 inp 1 0 nmos l=1u w=50u ad=34p pd=54u as=34p ps=54u
m4 2 2 50 50 pmos l=1u w=10u ad=14p pd=22u
m5 3 2 50 50 pmos l=1u w=10u ad=14p pd=22u
m6 out vb 0 0 nmos l=1u w=10u ad=14p pd=22u
m7 out 3 50 50 pmos l=1u w=20u ad=28p pd=44u
*
.options accurate gmin=1e-20 gmindc=1e-20
.dc vin -2m 2m 1u
.plot dc v(out) xlabel input yunits vout
*.plot dc i(vp)
.end

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 240

The DC response is shown in Figure 9.16.


The voltage gain is approximately 10,000 or 80 dB. There does seem to be a slight offset,
on the order of −100 uV. The offset is due in part to the declaration of m7 as an L = 1 μ, W =
20 μ device, when we already know that a device of this size will not be exactly the same as
two 1 μ × 10 μ devices in parallel (m4 and m5). The threshold voltage of m7 will be a tiny bit
lower due to the greater width, and the overall encroachment of FOX into the channel will be
less; therefore, m7 will conduct a tiny bit more aggressively. In fact, in the actual layout, m7
will most likely be two or more devices in parallel, which would solve the problem altogether.
In any case, statistical variations causing random offsets are expected, even between
“matched” devices, and will easily exceed any design offset in production (see memory
about device offset matching in Chapter 7).
This amplifier has many difficulties. First of all, it is just an amplifier, NOT an op-amp. An
op-amp would be presumed to have higher gain and low output impedance, capable of driving
loads usefully. This amplifier has an extremely high output impedance, on the order of 2 MΩ.
Further, the common mode range is restricted within a 0.8- to 4.6-V range; if the two input
signals are not biased within this range, the amplifier offset begins to wander.
The amplifier draws very little current, a bit under a microamp, but could be made to draw
less, through lower biasing conditions on VB; lower bias current will result in yet higher
output impedance. In addition to offset, it will show appreciable noise, particularly at low
frequencies,

Figure 9.16 Plot of 7-transistor amplifier DC transfer function.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 241

due to the rather small devices used. High impedance nodes within the amplifier in
conjunction with gate and junction capacitances will cause the amplifier to be slow. Finally, if
the output is fed back to the INN input, as would be allowed with an op-amp, the internal
phase shifts will cause oscillation. This amplifier therefore cannot tolerate large amounts of
feedback, and would best be described as a slow comparator. As a comparator, it has a
response time of about 1 μS from a 5 mV overdrive.
The only way to make the amplifier faster is to use higher currents to charge the circuit
capacitances more quickly, or smaller devices that have smaller capacitances in the first place.
If we want low noise, particularly at low frequencies, we will need larger devices, which is at
odds with low power and speed. If we want to apply feedback around the amplifier, we will
need some means of compensation. Finally, we may wish to drive actual loads with the
amplifier, which may require lower output impedance.
At this juncture, I have to make an important point: There is no such thing as a universal
amplifier for CMOS circuits. Unlike standard cells that have fixed logical functions, your
amplifier designs will be of all sizes and shapes, bandwidths and noise levels, and draw supply
current over a range of maybe six orders of magnitude. You may, within a particular design
environment, construct a single amplifier that can be used in many places within that IC, but
for the next project a completely different set of characteristics may be required. Expect that
every chip you design will have amplifiers that are different from those used in the last
project. Get used to the idea of understanding amplifier design so completely that the work
becomes second nature to you; only then will it become fun. Let’s call this the sandbox
attitude.
Commercial op-amps are general purpose parts, and as such, must satisfy as wide a range of
applications as possible. In each application of your ASIC, however, you have just one job that
needs to be done, and very specifically. Further, the amplifier will never stand alone; it will be
attached to other circuitry that may make it impossible to isolate the amplifier so as to
determine general purpose characteristics, such as gain bandwidth or slew rate or noise. In this
case, the amplifier becomes a part of the circuit you’re working on, and may be best tested in
SPICE as a system. You will find that while you work on the amplifier portion of your design,
you will have to load the amplifier as it is in the system, drive it with system-derived signals,
and so forth, and that testing the amplifier in isolation will lead to general mischaracterization.
A popular commercial amplifier concept today is that of rail-to-rail input common mode
range, and output rail-to-rail capability. We can see that our 7-transistor amplifier does have
rail-to-rail output swing, but

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 242

this is with no load. As a comparator, this amplifier will drive a Schmitt trigger to provide a
logic signal nicely, but its input range is somewhat limited; we could generate an internal
boosted power supply on-chip that would extend the input common mode range of a variant of
this amplifier, or we could build an amplifier with two input stages that together cover the
entire supply voltage range and tolerate the crossover distortion that will inevitably result.
However, the best solution may be to design our system so that a restricted input signal range
is acceptable. Once again: Don’t attempt to transfer standard component design techniques
into the ASIC world; here, the entire system is up to you.
The 7-transistor slow comparator shown earlier is only adequately optimized if we are
happy with it as it stands. Every modification we make to it to improve one characteristic will
inevitably affect other characteristics. We need to look into the application for the amplifier to
see where we can make acceptable trade-offs. Let’s try to make the amplifier really fast, and
see where the exercise takes us.
Speeding up the 7-transistor amplifier Right away, we know that increasing amplifier
current and decreasing device size will speed the design up. We’ll ignore noise and offset for
now, and go for raw speed. To do this, our devices will surely be operating in the saturation
region, well away from the subthreshold region. We’ll leave the device sizes alone, and
simply jack up the bias voltage VB to 1.5 V.
The supply current is now 550 μA, and the output is not as ideal, as shown in Figure 9.17.

Figure 9.17 Transfer function of 7-transistor amplifier with a very small input signal, biased
to a high current.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 243

The gain has reduced to, maybe, 4000 at the middle point of the output swing, which, by the
way, is not so “to-the-rails” anymore. If the output signal is intended to drive a Schmitt
trigger as a comparator, I say, “so what?” Its input common mode range, however, has
collapsed to be only useful from about 1.8 V to about 3.5 V, which may be OK provided our
inputs are centered within this range. To see how it responds to quickly changing signals,
we’ll drive it with a differential square wave and measure the prop delay as shown in Figure
9.18.
The input signals shown in Figure 9.18 are also plotted (but are hard to see) and transition
from +50 to −50 mV at the 1 nS and 6 nS points. A 15 fF load capacitance has been attached
to the output. We see that the delay is on the order of 2 ns for a rising output and almost 3 nS
for a falling output. I suspect that we can speed the amplifier up further by changing all of the
devices to a minimum gate length (0.6 μ), but leave the NMOS current sources m1 and m6 at l
= 1 μ so that the supply current stays the same, as shown in Figure 9.19.
The output rising delay is a bit over 1 ns, the falling delay is about 1.7 nS, but the gain has
probably decreased due to shorter amplifier gate lengths. This isn’t bad though, for a 0.6-μ
circuit that draws half a milliamp. The lesson here is that once we have a basic structure to
work with, we can modify it to suit our needs, accepting or rejecting any negative
consequences along the way. Let’s look closer at this amplifier and see how we might
change its structure to improve its characteristics.
The 7-transistor amplifier employs two stages of gain: the first generated by the input
differential pair m2 and m3 coupled to the PMOS

Figure 9.18 Transient response of 7-transistor amplifier with Id = 550 uA.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 244

Figure 9.19 Transient response of a 7-transistor amplifier with smaller devices but the same
Id as in Figure 9.18.

load devices m4 and m5; the second stage is simply m7 acting against the m6 current source.
The response of the amplifier, when used as a comparator, is slowed by the high drain
resistance at m3 and m5, as this node drives the gate capacitance of m7. If we include this
terminal in the SPICE output for the most recent comparator simulation, we get the plot of
Figure 9.20.

Figure 9.20 Display of the internal node at the output PMOS gate (m7).

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 245

We see that this internal signal is slowly responding; it is the dominating time constant in
the design. We were able to obtain high gain by the use of two stages, but the cost of the
second stage loading the first stage results in slow response; that is, if you consider a few
nanoseconds slow. We could make the second stage devices smaller, reducing the load on the
first stage, but at some point, they would become the speed limiting devices, acting on the
capacitive load. Such balancing of device sizes for speed is exactly what happens when we
suffer logic delays due to fanout. This comparator looks like it could be made sub-nanosecond
for a 50 mV overdrive, provided the devices are carefully sized.

The current mirror amplifier


This structure is useful in designs that need the exponential Id/Vgs characteristic of the
subthreshold region. M2 and m3, operated in subthreshold, have their drain currents mirrored
to a common output point. If the output is terminated to mid-supply with a resistance, the gain
of the amplifier will depend on the source current through m1. As such, it is a current-
controlled amplifier. Since the current is controlled by the voltage at VB, it is also a voltage-
controlled amplifier (VCA).

An important feature that deserves mention is that the only high resistance point in the
circuit is at the output; all other nodes are either connected to device sources with an effective
resistance of 1/gm, or device drains that are tied to their gates, again with a dynamic resistance

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 246

of 1/gm. The amplifier bandwidth is determined largely by the output load capacitance. The
design, however, suffers from the lambda effects of the devices, which already have long gates
to counter the lambda effect. Despite the effort, the devices occupy a large area, and result in a
significant input offset voltage. In particular, the lambda effect at m4 causes the output to tend
toward ground, which requires an input imbalance to center the output mid-supply.
We can use cascade techniques to improve the current mirrors, improving the offset
problem greatly; but the schematic is much more complicated. The actual layout, due to
smaller device lengths, is smaller (at the layout level) than the previous schematic.

This appears quite complicated, with 20 devices and 17 nodes. With your understanding of
cascade devices and the earlier schematic, which is identical, except for the cascade devices,
the schematic should be easily understood. The devices to the left, however, are new; they act
as bias devices for the current mirrors, producing bias potentials for the cascade devices.
The VC signal drives m5, which will set the current through the input differential pair,
determining the amplifier’s overall transconductance and determining the currents through
all signal branches. Device m1

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 247

produces a similar current to produce a P cascade potential across m2. M3 reflects this current
again to m4, which sets the N cascade potential. M2, m3, and m4 are made intentionally weak,
with larger lengths and smaller widths, so that the potentials across these devices will be larger
than the gate potentials of the P devices m10, m11, m14, m15, and the N devices m5, m16,
and m19. The cascade devices act to make the drain potential of the devices they serve
constant. For example, m5 sets the current through the differential pair, and m6 only serves to
set the drain potential of m5 at a stable value. If properly biased, the gate potential on m6 will
only slightly affect the current through m5. This can only be the case if m5 is in the
subthreshold or saturation region of operation. Therefore, the cascade potentials are adjusted
to improve output resistance of the transistor pair; as the cascade bias is increased, the output
resistance will improve abruptly (as the current setting device enters the saturation region),
and will further improve slightly with yet greater cascade bias. The trade-off is between high
output resistance and limited signal voltage swing, since a high cascade potential will require
accordingly high cascade device drain potential to keep the cascade device in saturation. It is
best to adjust the cascade bias potentials using SPICE as a guide to resulting performance.
Note: Even in the subthreshold region, where Vdsat was earlier declared as zero, some
potential must exist between the source and drain terminals for current to flow. In
subthreshold, this is usually about 100 to 200 mV.
This current mirror amplifier with cascade devices can be biased to about 3.4 μA of supply
current by setting VC, showing an unloaded open-loop gain of 85 dB, with a response that
falls at 6 dB/octave, beginning at about 1 KHz. Its gain bandwidth product is approximately
15 MHz, and it’s offset (due to current mirror nonideality) is about 100 μV. With the output
loaded with a resistance to mid-supply, it functions as a voltage-controlled amplifier and is
quite linear up to a differential input signal of 30 mV. It has approximately unity gain with a
100 KΩ load termination. With a peak input signal of only 30 mV, the signal-to-noise ratio
will suffer, and, perhaps, be limited to 70 dB over a 10 KHz bandwidth,but it will perform
nicely as a VCA.
The signal paths through the current mirror amplifier include several low impedance nodes
that are also connected to transistor gates that cause frequency response limitations. The phase
shift of the signals within the amplifier is such that connection of the output to the negative
input (INN) to make a voltage buffer would result in oscillation. The problem can be solved
by capacitively loading the output so that the open-loop gain of the op amp is well below 0 dB
at the frequencies where these phase shifts occur. In this design, biased to Idd

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 248

of 3 μA, a 1 pF capacitance will suffice. There are, however, ways of producing amplifiers
with only one such gate capacitance-loaded internal node (as opposed to three in the current
mirror amp), and when operated fully differentially, such nodes become relatively
unimportant.

The folded cascade amplifier


Now we’ve seen a complicated cascade design, this one should be easy to unwind into an
understanding. M1, m2, m3, and m4 develop bias potentials from VB, which in this case is a
PMOS bias potential. I’ve included currents through the VDD and GND paths as a guide,
which I suggest when designing these amplifiers. Knowing the currents through the devices
will help in establishing reasonable device sizes for a balanced design.
The current into the differential input pair, m7 and m8, is 20 μA, which will be divided
between the two transistors as individual 10 μA currents into m9 and m10 at the output stage.
The PMOS output devices m13, m14, m15, and m16 are biased to pass 10 μA each. The
summation

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 249

of these currents at m9 and m10 will result in 20 μA through each of these devices. Input
signals will change the otherwise even distribution of current into m9 and m10, affecting the
current summation at the output.
M9 and m10 are biased to conduct this current by their gate connections to the junctions of
m11 and m13, and the gate capacitance will load this node, causing an internal phase shift at
high frequencies. The design, although a bit complicated at first sight, is actually quite simple,
and performs very well from many points of view.
The schematic could be turned upside down, with NMOS exchanged for PMOS, but in this
case, the equivalent of devices m9 and m10 would need to be a bit larger (as PMOS devices
are not as conductive), and the gate loading that leads to the internal signal phase shift would
be worse. Further, in this configuration, PMOS devices are used as the input devices, which
often have better noise characteristics than NMOS, provided they are designed to operate
toward the subthreshold region, where their transconductance will be high.
One advantage of this design is that the drains of m9 and m10 are held to a fairly low
potential, on the order of a few hundred millivolts, so the input pair can easily sense
differential signals at a common mode potential of GND, or even below GND.
The sizing of the devices will determine bandwidth, noise, and output voltage swing. If the
transistors are small and operated at high current density, they will be solidly in the saturation
region of operation, and the bias potential will be high. This will restrict the output swing, but
greatly widen bandwidth. I have included the current values through the supply and ground
paths as a guide to device sizing; start by selecting a device length (perhaps 2 μ), and use a
rule for device width, like 1 μA/μ for N devices and 0.5 μA/μ for P devices. These are
effectively current densities that you will vary to optimize performance. The amplifier will
have wider bandwidth, but a more restricted output swing with a rule like 2 μA/μ for N and 1
μA/μ for P. As you narrow in on acceptable characteristics, the current notation and your
“rules” can allow a quick determination of device widths. Remember, the cascade bias
devices need current densities that are about 4 to 6 times that of the other devices, to ensure
good cascade bias potentials; these ratios should be adjusted through SPICE simulation for
optimization.

Device Sizing, a Current Density Approach


With amplifier design in the background, allow me to present a technique for device size
selection that I haven’t seen detailed elsewhere; I call this the current density approach to
device sizing. This is a sandbox method.
We are familiar with characterizing a single device by plotting drain current against gate
voltage, or against drain voltage at a specific bias,

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 250

as we have since the invention of the vacuum tube; basically describing a specific part with
charts and graphs. The present problem in analog IC design is that we know the behavior of
specific devices, but we can draw any size or shape of device. We are more concerned about
how behavior changes with size and aspect ratio, than we are about one single device.
In amplifier design, we’ve already chosen the currents at which we want to operate our
devices, based on supply current limitations, output drive current, and so forth, and we would
like to know the transconductance, gate bias voltage, and Vdsat characteristics that result from
varying device dimensions. We can quickly calculate gate capacitance from dimensions, but
these other important parameters are not immediately obvious from single device plots.
We can plot characteristics of a given transistor type (NMOS, PMOS) of a fixed gate
length, by sweeping a current through the device. This should be done on a fairly wide device,
maybe 10 μ, so that narrow width issues do not come into play. The device is connected gate
to drain with source grounded, and the current through the device is scaled according to its
width, so the X axis represents a current density in amperes/micron of device width. A few
such curves at different common gate lengths of both N and P devices will then serve as a
guide for device optimization.
The most instructive curves will be gm/Id and Vdsat. Training your SPICE program to
perform these simulations is like making it jump through hoops, but once you have plots for
the devices in your selected process, they can serve as a reference for all future designs. It’s
worth the trouble. I’ve put together a few from the sandbox 0.6-μ models to show their
value.
The relative gm (gm divided by drain current) and the Vdsat of three gate lengths (1, 2, and
10 μ) of NMOS devices plotted against current density in amperes per micron of gate width
follow. We’ll print ‘em out and then I’ll yak about ‘em.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 251

It is extremely important when designing amplifiers that you think in terms of current
density. Chose a basic amplifier design, draw the schematic, and assign currents to each
supply and ground node. The sum of the supply currents will of course equal the ground
currents, so check to make sure you didn’t make an error. Base your current assignments on
expected output drive current, which may be required to produce a minimum voltage across a
known load resistance, or the charging of a known load capacitance at some required rate.
Choose a gate length, and refer to graphs like the two above, to get a feel for device widths,
depending on how high you would like the device transconductance to be, and how large a
Vdsat the circuit can tolerate. Large Vdsat values will essentially mean that the drain potential
cannot swing below Vdsat, or the device will fall out of saturation; amplifiers that require a
wide output swing range will need low Vdsat values.
These curves show that for 1-μ gate lengths (the right-most curve in the gm plot, the lower
curve in the Vdsat plot) at a current density of 100 nA/μ of gate width, the transconductance is
quite high, and making the device wider (lowering current density) will increase gate
capacitance, which could slow the circuit down; an increase in width will only marginally
increase transconductance. On the other hand, reducing the device width to a higher current
density will lower transconductance. At 1 μA/μ of device width, the transconductance is
falling quickly, and at 10 μA/μ the transconductance is one-fifth that of the lower current
density, and falling fast. At 100 μA/μ, the transconductance is extremely low. The use of long
gates, however, while maybe increasing output resistance, will surely lower relative
transconductance unless the current densities are very low.
Hey, this is only two-and-a-half pages. Reread it until it is clear. I cannot overstress the
importance of this method, nor can I find a way to put it more simply; when you understand
this aspect of amplifier design, you will be in complete control of the process.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 252

MOSFET Noise
There are several variables that affect amplifier design: Supply current, output drive current,
signal swing at the input and output, gain, linearity, speed, and noise. So far, we’ve
illustrated three amplifier structures, and all other amplifiers will be pieces of one or another
spliced together to do what we need done. We have a rough idea of how currents charge
capacitances, delaying signals and shifting their phase, and we have techniques for modifying
designs to deal with all of these issues except noise.
The noise in MOSFET circuits is of three basic kinds. One is thermal noise, calculated from
the familiar En = SQRT (4 × K × T × R × Bw). In the case of MOSFETs, the R component
is 1/gm, the source resistance of the device, plus the series resistance of the gate material. We
know that increasing device current will increase gm, but we also know that for a given
current, gm will increase as we design devices more toward subthreshold, at lower current
densities. Additionally, once we have dimensioned devices into the subthreshold region,
further shortening and widening will not change transconductance much at all.
The series resistance of the gate material can be calculated, and if it threatens to increase
thermal noise, the gate material can be contacted at each end of the transistors, and the length
of the poly strips (width of devices) can be changed to minimize poly resistance. Effective
source resistance, however, can only be reduced by the use of high currents and devices
designed toward subthreshold operation.
Noise is best thought of as a voltage, reflected to the gate input, as though it was a signal
source in series with the gate. Knowing device gm and series gate resistance, the operation
temperature, and the measurement bandwidth, we can calculate the equivalent input noise of a
single device with SQRT (4 × K × T × R × Bw). The gate referred thermal noise voltage
times gm gives us a drain thermal noise current value.
A second noise generator is shot noise, which is independent of temperature. Shot noise
reflects the quantum nature of current flow. Imagine a current of 1.6E–19 A flowing through
a wire; since the fundamental charge on the electron (q) is 1.6E–19 C, the wire is carrying, on
an average, one electron per second. There will be 1 second periods when no electrons are
transferred, and other 1 second periods when two or more pass down the wire; only the
average is 1/s. The RMS value of current noise that results from an otherwise constant current
is the familiar expression: In = SQRT(2 × q × I × Bw). High currents have a higher noise
component, but as current increases, the noise component only increases as the square root of
conducted current. Large background currents upon which small signal currents are imposed
will lead to reduced signal-to-noise ratio through the shot noise mechanism. This

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 253

is particularly important in photodiodes that have a background current (perhaps from ambient
light) within which a small signal is carried.
A third noise generator will also be present, due to slight fluctuations in threshold potential
that come about from charges that are intermittently trapped in the thin gate oxide. This is
known as flicker noise, or 1/f noise. The SPICE parameters Kf and Af are used during
simulation to determine noise degradation from this flicker mechanism. Flicker noise is
presented as a voltage generator in series with the gate terminal, and its effect decreases as the
device size increases. Large gate areas are required to minimize flicker noise.
The first two noise mechanisms are broadband with a normal amplitude distribution, and are
often expressed as a given power per Hertz, or alternatively in volts or amperes per SQRT
(Hz). The bandwidth (Bw) value in the calculations for thermal and shot noises determines the
resulting RMS noise value for a given bandwidth. In these calculations, it doesn’t matter
what frequency the band is measured over, it is only the bandwidth (in Hz) that enters into the
calculation. Further, uncorrelated noises sum orthogonally; that is, the sum of two independent
noise sources is the square root of the sum of the squares of each noise source.
Flicker noise is also called 1/f noise, because this noise mechanism is frequency dependent;
it’s power per hertz value increases as measurement frequency decreases, at the rate of 10 dB
per decade. For small MOSFET devices, the noise at low measurement frequencies is far
beyond the other two noise mechanisms. For systems that need good SNR at low frequencies,
amplifiers must be built with large gate areas.
This is a plot of noise in volts/SQRT (Hz) for an L = 2 μ, W = 1000 μ NMOS device. We
can see that the thermal noise at high frequencies is very low, under 2 nV/SQRT (Hz). The
noise begins to increase, however, at about 100 kHz, and rises at the 10 dB/decade rate.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 254

Increasing L to 4 μ and W to 2000 μ improves the situation, but only very large devices will
offer good noise performance at low frequencies. It is important to note that in wideband
systems, the contribution of the large 1/f noise at lower frequencies is not as bad as it may
look on these graphs; The noise between 1 and 10 Hz is indeed large, on a per root Hz basis,
but there is only 9 Hz of bandwidth between these points. In this last graph, the band between
10 and 100 kHz covers a 90 kHz bandwidth, and will contribute more noise due to it’s
greater bandwidth. If this last graph is plotted on a linear scale from 1 to 100 kHz.

In general, for a given drain current, the 1/f noise at 1 Hz will be reduced by a factor of 2 by
increasing the gate area by a factor of 4. Since the slope of the noise is 10 dB/decade of
frequency, quadrupling the gate area will cause the “corner frequency,” where 1/f noise
begins to rise out of the thermal noise background, to lower by a factor of 4.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 255

In low-frequency applications, speed is usually not an issue, and large gate devices can be
used. Alternatively, the technique of chopper stabilization can be employed to remove 1/f
noise as well as offset voltage.

Device sizing with noise in mind


Many circuits are noncritical when it comes to noise performance. Other circuits depend very
much on low noise characteristics. In any case, the quality of signal acquisition is always noise
limited. If the signals you process are already noisy, then handling them with extremely low
noise amplifiers may not be helpful, but lowering the quality of a signal through the use of a
noisy input amplifier destroys any opportunity that may result from processing the signal to
enhanced resolution.
Devices that operate over narrow bandwidths at high frequencies, such as the RF input stage
of a receiver, do not generally suffer from the 1/f noise problem, but do need to have good
noise figures. This is accomplished by operating the devices near subthreshold, with large
drain currents, leading to lowered effective source resistance and low equivalent input noise
voltages. RF amplifiers are usually single devices or differential pairs, where noise
calculations are rather straightforward.
In amplifiers that are intended to operate over a wide range of frequencies, including DC,
all of the devices within the amplifier will contribute noise that can be computationally
reflected back to the inputs as additional effective noise voltage components. Usually, the
amplifier inputs are brought directly into the input differential pair. If this pair is designed to
have low source resistance and consists of large area gates, they will be the lowest noise
devices in the entire amplifier. You will not necessarily need to design all of the other devices
with such sizes and resistances, provided the input pair has high transconductance, and the
other devices, current sources, and mirrors operate at a much lower transconductance.
Remember, all MOSFET noises can be reflected back to their gate inputs as a single effective
input noise voltage, which will find its way into the amplifier as a drain current noise; the
relationship being the transconductance of the device. Smaller and generally noisier devices
can be used inside the amplifier, provided the transconductance of these devices is low,
compared to that of the differential input pair.
Large devices will have associated gate capacitances that will lower the bandwidth of
amplifiers within which they are used. Higher currents may be required in large gate area
designs to keep the bandwidth up; currents that go beyond the need for output drive capability.
Further, the use of large area input differential pairs will cause the inputs to couple to the
pair’s drain terminals, somewhat complicating the amplifier’s response. In any case, it
should be remembered that cascade

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 256

devices reflect little if any noise to the amplifier’s inputs. Use the cascade devices to lower
the impedance at the drains of the devices they serve; if those devices are held well within the
saturation region, noise inherent in the cascade devices will have a very much diminished
effect on cascade current and, therefore, noise.

Closed-Loop Stability
Amplifiers are often used with feedback networks where the amplifier’s high open-loop gain
causes the closed-loop circuit to be more accurate and deterministic. If the phase shifts of
frequencies that have significant gain become too great in the open-loop response, the
amplifier may become unstable under closed-loop conditions. Amplifiers used in feedback
circuits require some means of compensation to achieve closed-loop stability. Adequately
compensating an amplifier can be frustrating if you don’t have a good understanding of the
mechanisms involved. Here, I’ll try to explain the phase shift problem and methods through
which compensation opportunities can be identified. We will not need extensive equations,
only the understanding that an RC time constant produces a phase shift and an amplitude loss
at high frequencies, and we’ll be able to derive compensation solutions quickly. Let’s call
this a sandbox understanding.
Let’s go back to the 7-transistor amplifier, with device dimensions as shown in the
schematic, and subject it to SPICE, plotting open-loop response (in dB) and phase (in
degrees). To do this, we will apply a differential signal to the two inputs, biased to mid-supply,
and offset by a DC amount to allow the amplifier output to be near mid-supply. The amplitude
response is plotted in Figure 9.21.

Figure 9.21 AC response of the 7-transistor amplifier.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 257

The phase response of this 7-transistor amplifier is plotted in Figure 9.22. These plots are
with no output loading.
Referring to the amplitude response, the gain at low frequencies is over 80 dB; the response
seems to fall, beginning at about 10 khz at a 6 dB/octave (20 dB/decade) rate, and the
frequency at which the gain is 0 dB is approximately 30 MHz. The most interesting feature of
the amplitude response is the second response roll-off that appears to occur at a little over 10
MHz.
The phase response shows what’s happening. At low frequencies, the output is exactly in
phase with the input. At 10 kHz, where the first roll-off occurs, the phase begins to transition
to −90°, where it holds fairly constant until about 10 MHz, when it transitions again to
−180° and continues on, wrapping around the phase scale and moving toward 0° out at 1
GHz.
Recalling that the −3 dB frequency of an RC filter is Fc = 1/(2π × R × C ), and that the
maximum phase shift of an RC filter is 90°, we see that there is one RC at 10 kHz, which we
will call the dominant RC, another around 10 Mhz, and some other dynamic signal coupling
mechanisms above 100 MHz that are causing additional phase shifts at extreme frequencies. It
appears that if the second RC were not present, the gain would fall from the first RC frequency
of 10 kHz and 80 dB of gain to 100 MHz at 0 dB of gain. The second RC interrupts this
extended frequency response.

Figure 9.22 Phase response of the 7-transistor amplifier.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 258

Referring back to the schematic, there are two obvious high resistances driving capacitive
loads: the first being the junction of the drains of m3 and m5 driving the gate capacitance of
m7, and the second, the output driving the drain junctions of m6 and m7. In addition, the
output stage of m6 and m7 has gain, and there is a slight and unavoidable feedback
capacitance from drain to gate at m7, making the apparent capacitance of m7’s gate higher
than would be calculated by gate dimensions.
It would appear that the 10 kHz roll-off is due to the first RC, and the 10 MHz roll-off is
due to the second. If we load the amplifier with a capacitance, it is expected that the 10 MHz
roll-off (and attendant phase shift) will move to a lower frequency, as the load capacitance is
increased.
The stability problem occurs because the phase shift hits 180° (rolling around the display
window) while the amplifier gain is greater than unity. It can be confidently predicted that if
the output is tied to the negative input, attempting to use the amplifier as a signal buffer, it will
oscillate at a frequency in the 10 to 20 MHz range. If a network is connected between the
output and the negative input that attenuates the signal by 20 dB, the amplifier will be stable;
looking at the open-loop response, and attenuating it by 20 dB shows that the resulting gain
will be 0 dB at 10 MHz, and the phase will be less than −180°. The amplifier/feedback
attenuator combination will have a low frequency gain of 20 dB (the feedback attenuation
amount), but it will “ring” when handling sharp transients, at some frequency near 10 MHz,
where the phase margin is inadequate for well-behaved operation. If this amplifier was to be
used in an application with a significant gain (perhaps greater than 20), it would not require
compensation at all.
If the amplifier is to be used as a unity gain buffer, it must be compensated. To do this, we
will lower the frequency of the dominant RC to perhaps 1 kHz, by a factor of 10. Since the
characteristic of an RC is to attenuate higher frequencies at 20 dB per decade of frequency, we
will expect all higher frequency features of the gain plot to be reduced by 20 dB. To do this,
we can attach a capacitor between the gate of m7 and supply approximately equal to 10 times
the equivalent capacitance of m7’s gate. Alternatively, we can use the gain of the output
stage to magnify the effective value of an added capacitance by connecting some smaller
capacitor value between the output and the gate of m7.
Experimenting with capacitor values connected this way, we find that a 50 fF capacitor
accomplishes this RC shifting to locate the dominant RC at 1 kHz. We didn’t make any
calculations, we just inserted a few

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 259

values to confirm that this is the dominant RC, and that 50 fF seems to get our corner down to
1 kHz. The higher frequency features of the amplitude response are indeed reduced by 20 dB,
and the 0 dB gain point is now 10 MHz.
The phase at this unity-gain frequency is still dangerously near to −180° instability point,
so we employ one last technique to improve our phase

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 260

margin; we add a resistance in series with the 50 fF capacitor. We have introduced 10 times
the capacitive load by adding our 50 fF capacitor, which is now the controlling element that
forces a 20 dB/decade response roll-off and a 90° phase response. The second roll-off of the
amplifier output driving drain capacitance still exists at about 10 MHz, causing the phase to go
on toward 180° total. If we add a resistor in series with the compensation capacitor and make
its value such that it equals the compensation capacitor’s reactance at 10 MHz, the dominant
RC will stop its attenuation at this point, where the second RC begins. The resistance works
out to about 300 K Ω. With this combination in place, our phase margin increases to an
acceptable value.
The phase margin is defined as the difference between −180° and the amplifier’s phase
shift at the 0 dB gain frequency. This design looks like it has about 60° of phase margin at
the 0 dB point, which is at 10 MHz. Sixty degrees of phase margin will allow a reasonably
well-behaved transient response.
The amplifier draws half a microamp of current, has a gain bandwidth of 10 MHz, and a
low frequency gain of 80 dB, unloaded. Not bad, eh?
A few points deserve mention: The capacitor is attached to the output, with the resistor to
m7’s gate. As a double poly capacitor, we are careful to connect the bottom plate of the
capacitor to the amplifier output, which is expected to drive something that will have some
associated capacitance already. The bottom plate of the compensation capacitor will simply
act as an additional output load capacitance. The resistor, however, is of a very high value, and
will be large, with high stray capacitance to substrate unless the process has a poly layer
available with a high sheet resistivity. Most amplifiers of such low power consumption are
only attempted in processes that have high sheet resistivity layers

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 261

available. Further, most amplifiers in general will be biased to higher currents, where we will
probably do with a smaller compensation resistance. Be sure to calculate the stray capacitance
to substrate of your compensation resistor, and divide it into two parts, one attached from each
end of the resistance to ground.
The resistor was added to improve phase response around 10 MHz. The second RC that was
considered in calculating the compensation resistance was the amplifier output resistance
driving the drain load capacitance, at about 10 MHz. If we add a load capacitor to the
amplifier output, this RC will fall in frequency, again threatening stability. The compensation
capacitor will have to be increased to regain stability, but the resistance value will not change.
This is because the second RC frequency has fallen and the compensation capacitor has
increased so the time constant will be correct without changing the resistor value. For a 500 fF
load capacitance, a compensation capacitor value of 200 fF is required, along with the 300K
resistor.
Compensation can be added in this way to any amplifier, with proportionate reduction in
certain aspects of performance, in this case, lowering of the unity gain frequency.

A simplified approach to compensation


Identify the dominant RC in the design and any secondary ones that distort the phase shift,
threatening stability.
Lower the frequency of the dominant RC with an added capacitor and a series resistor that
compensates for any higher frequency time constants, until adequate phase margin is obtained.
Be sure to do this with expected output loading in place.
For closed-loop unity gain stability, the ratio of the first RC (dominant) and the second RC
should be at least the DC gain of the amplifier, or compensation will be impossible. In this
case, we needed to increase the first RC (lowering its frequency) by a factor of 10 to get a
large enough ratio.
Once you get a feel of the process, you will not need to isolate the amplifier as a separate
unit for analysis; you will be able to compensate amplifiers when they are surrounded with
application-specific circuitry. Your ability to do this on-the-fly will improve with experience.
Always remember to add stray capacitances wherever they occur in the layout. As an example,
the compensated 7-transistor design can be used as a buffer by tying the output to the negative
input, but this will cause a capacitive load on the output, requiring a change to the
compensation capacitor value. Capacitive loads on the buffer will further change its
characteristics, requiring yet other compensation tweaks. Learn to compensate amplifiers in
the application, only rarely in isolation.
Finally, your choice of compensation values will most likely not be done with an AC
analysis, but with a transient analysis instead, with the

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 262

amplifier embedded into the circuit it supports. You will be trying to achieve quick settling of
the output when stimulated with a transient disturbance. Some brief overshoot, followed by a
well-damped oscillation may be acceptable. Be sure to remember that the process will vary
from wafer to wafer, and make sure that over the expected range of capacitance and resistance
values, and amplifier bias conditions, that the amplifier performs adequately.
Overcompensating the amplifier will lead to sluggish performance, while undercompensating
will lead to ringing. Your efforts will lead to a suitable trade-off between the two extremes,
over the expected range of component tolerances.

Driving Resistive Loads


The classic method of driving large output currents, used in older discrete op-amps, is to use a
bipolar emitter-follower output stage, with the devices biased into AB operation to lower
quiescent current. If we attempt this with MOSFETs, we will suffer the same problem of the
older bipolar parts; that of restricted output swing range. In the case of Vdd potentials of 5 V
or less, and threshold voltages in the order of a volt, the output swing range becomes severely
limited with this approach.
Alternatively, PMOS can pull outputs high and NMOS low, but the drains of these devices
will present a high impedance to the load, and a significant RC time constant will result while
driving large capacitances. The use of cascade techniques, as in the folded cascade design,
cause the dominant RC to exist at the output node, while the higher frequency time constants
are within the amplifier itself. These circuits can be operated in class A mode, where the
quiescent bias currents of the devices represent the maximum allowable output current.
Capacitive loading acts to lower the amplifier’s bandwidth, actually improving stability in
feedback situations.
The earlier defined 7-transistor compensated amplifier can drive resistive loads, but the
open-loop gain is reduced significantly in the process. A 100K load resistance drops the open-
loop gain to 38 dB, a megohm to 58 dB. This may be adequate, depending on the application.
Higher bias currents will obviously lead to greater drive capability.

A high output current, low bias amplifier


There are applications where wide output swing at high currents is required, but the
continuous bias current of class A operation would lead to unacceptable quiescent power
dissipation. In these cases, a suppressed bias design can be considered. The next amplifier
design produces small drive voltages to massive output devices, and under zero

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 263

input signal, the bias current is on the order of a milliamp or less. When the input signal
deviates from zero, one or the other output device is turned on. All devices in the design have
1-μ gate lengths, and the widths (in microns) are printed above each device. The peak output
swing with a 10 Ω load to mid-supply swings within a few hundred millivolts of supply and
ground.
The basic technique is based on the eight load devices, m4 through m11, that are cross-
coupled to allow large positive swings at nodes 2 and 3 only under imbalanced conditions.
When the current through the differential input devices is equal, the loads keep nodes 2 and 3
at near-threshold voltage levels.
The stability setting RC is at nodes 2 and 4, and the large capacitance of the output device
gates adequately slows these nodes to provide stability.
The amplifier can be used as a differential output driver, based on signals that are biased to
a mid-supply that is easily produced from a resistor divider.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 264

Wideband Amplifiers
For receiving narrowband high-frequency signals, tuned RLC matching networks can make
use of the gate capacitance of MOSFETs to advantage, but wideband circuits must rely on
resistive loads to operate down to DC. A simple pair of biased devices can provide a gain
module that can be tailored to suit your needs.

The amplifier has both differential inputs and outputs. The gain will depend on the value of
the load resistors and the transconductance of the input differential pair. A bias generator
could be produced that derives VB as a function of the load resistor values, so that gain can be
better controlled from lot to lot. Obviously, high gain can be obtained by designing the input
differential pair toward the subthreshold region, but this will mean large drain area values,
which will slow the amplifier down. The output source followers m5 and m7 provide buffered
outputs to better drive the loading capacitance of succeeding stages. Note that a source
follower (m5 or m7) will present a capacitive load at its input, which is largely a capacitance
from gate to source; since the source is closely following the gate input, the input loading is
slight. The output resistance of the source followers drives any succeeding input capacitance
more aggressively.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 265

Several such stages can be cascaded, and the bandwidth of each stage will depend on gain;
higher gains will mean large load resistor values and large m2 and m3 drain capacitances. The
frequency response plot of three such cascaded stages is shown here.

The current sources m1, m4, and m6 are L = 1 μ, W = 10 μ devices, and the other devices
are L = 0.6 μ, W = 40 μ. The load resistors are 1 K Ω, VB is 2 V, and the three amplifiers
together draw a few milliamps. The gain at the output of the third stage is about 22 dB, and is
3 dB down at about 2 GHz. Achieving high gain at wide bandwidth usually means the use of
multiple stages like this.
If you have an amplifier design that needs additional gain, consider using multiple stages
within the amplifier, provided a single RC is available within the amplifier (or perhaps at it’s
output) to stabilize the combination. All additional gain stages must have wide bandwidth so
that the RC frequencies they demonstrate do not threaten closed-loop stability.
These are the basic structures of amplifiers and some techniques for designing them for
specific applications. In later chapters we will be using amplifiers in different ways, which
will further reveal potential applications, difficulties in practice, and overall design techniques.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ANALOG CIRCUIT INTRODUCTION AND AMPLIFIERS Keith Barr 266

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
Source : ASIC Design in the Silicon Sandbox Keith Barr 267

The Bandgap Reference


The bandgap reference uses two opposing bipolar transistor characteristics to provide a
reference voltage that is stable over temperature and process variations. In a simple
CMOS process, the dedicated collector PNP device is most often used, connected as a
diode junction to ground.

The voltage-current relationship of a PNP transistor with base and collector grounded,
with an emitter size of 20 μ by 20 μ, is plotted for 0°C, 50°C, and 100°C, as shown in
Figure 10.1.
We see that as the applied voltage increases, the emitter current increases
exponentially. Even at 100° (the upper most curve), the current range over which the
plot appears straight spans at least seven orders of magnitude. The characteristic is similar
to the MOSFET in the subthreshold region, but the MOSFET has a slight curvature,
departing

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE BANDGAP REFERENCE Keith Barr 268

Figure 10.1 Emitter voltage of vertical PNP at 0°, 50°, and 100°.

very slightly from precise exponentiality. Further, the MOSFET has gate threshold
variations that are difficult to control with high precision.
The dedicated collector PNP, or vertical PNP, is drawn as concentric rings, usually
square, with P substrate contacting around the outside (collector), a ring of N contact to
well, (base) and in the center, a P diffusion acting as the emitter. It is the emitter
dimension that sets the characteristics of the device; emitter area is a parameter in the
spice model.

Check your model to see if it is area dependent or it is intended to be used as a set-


sized device. The difference is obviously extreme; the plots you obtain from SPICE
should show a voltage drop of about 600 mV at a microamp for small transistors. The
layout of the vertical device is shown in Figure 10.2.
The layout of the PNP device often requires a large substrate contact area around the
outside, so that abutted devices do not violate the well-to-well spacing rule.
The current density through the vertical PNP should not be too great, or the resistive
nonidealities of the structure will interfere with the otherwise precise exponential PNP
characteristic. Design your circuits, allowing for emitter currents that are in the range of
250 nA/μ2 of emitter area, or lower.
The three plots of Figure 10.1 show how the slopes of the PNP curves differ with
temperature. At 0°, the current increases by a decade for every 54 mV of emitter voltage
change, but at 100°, this is about 74 mV.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE BANDGAP REFERENCE Keith Barr 269

Figure 10.2 Layout of bandgap transistor cell.

Accordingly, at high current levels, the temperature coefficient is −2 mV/°C, and at lower
currents, perhaps −3 mV/°C.
For a single device, the emitter voltage required to conduct a specified current will decrease
with temperature. The same device, operating at a lower current, will show a greater decrease
in emitter voltage with temperature.
If an array of many identical transistors are all wired in parallel and driven with a constant
current, and a separate single transistor is driven with a separate but identical current value,
the single device will have a greater potential across it than the array of devices. The
transistors in the array each carry a fraction of the supplied current, while the single transistor
carries the entire current. The emitter potential of the array and the emitter potential of the
single device will both decrease with temperature, but the array emitter potential will decrease
more dramatically because devices at low currents have a greater temperature coefficient. The
difference between the two potentials will increase with temperature.
The bandgap reference exploits this basic characteristic by amplifying the difference
potential that increases with temperature, then adds this amplified value to the potential across
a transistor that is biased with a current. The result is a temperature stable reference voltage.
The SPICE plot of voltages and currents show a resistive limitation at the high currents, a
characteristic of small semiconductor structures. However, since we know that the temperature
coefficient of emitter voltage decreases as the current increases, evidenced by the plots, we
could imagine that at extremely high currents, the temperature coefficient would be zero. If we
extend the straight portions of the three

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE BANDGAP REFERENCE Keith Barr 270

curves upward and to the right, we can see that they will intersect at a much higher voltage,
and quite extreme current; at this point, the temperature coefficient would presumably be zero.
In fact, the three lines do not intersect at a single point: The 0° and 50° lines intersect
around 1.23 V, and the 50° and 100° lines intersect at a slightly lower potential. From this,
it is implied that there is no single zero-TC bandgap voltage; the zero TC point will again
depend on temperature. In practice, while building bandgap references, we will find this to be
the case.

Bandgap Design #1—Basic Principle


The best way to show a simple bandgap structure is with current sources and an amplifier.

M1, m2, and m3 are identical, and pass drain currents into their connected devices
according to the amplifier output, which drives the PMOS gates. Q1 and Q3 are single PNP
devices, while Q2 is 10 such PNP devices in parallel. For any given PMOS current, the
voltage across Q2 will be lower than that across Q1. In operation, equilibrium is established
with PMOS currents such that the voltage at node 1 equals the voltage at node 2. This places
the difference in emitter potentials of Q1 and Q2 across R1. The current will then increase as
the temperature increases, because Q2 is operating at a lower current density than Q1, and has
a higher temperature coefficient. M3 passes this current through the series combination of R2
and Q3, producing the bandgap output voltage of about 1.2 V. As temperature increases, the
voltage across Q3 decreases according to its temperature coefficient, while the

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE BANDGAP REFERENCE Keith Barr 271

Figure 10.3 Arrangement of bandgap transistors.

current through R2 increases, producing a compensating voltage across R2. A constant output
voltage results.
The ratio of Q1 and Q2 sizes are best set by making all transistors the exact same size, and
simply arraying some number of them to produce Q2. The number of devices used in making
Q2 can be anything you like; the X10 indicated above is chosen for convenience only. Making
Q2 a larger array will increase the potential across R1, allowing better production tolerances
and making the unavoidable amplifier offsets more acceptable. It is imperative that the devices
all operate at the same temperature, so they are usually built as a block of devices, tightly
packed together. The schematic shown could be a 12-device array, arranged so that the high
current density devices Q1 and Q3 are surrounded by Q2 devices, as shown in Figure 10.3.
Many different bandgap references can be built using this understanding of operation—each
to best suit the application at hand. Different structures will be needed depending on available
supply voltage, budgeted supply current, supply voltage variation immunity, size, availability
of high-valued resistors in the selected process, and so forth.

Bandgap Design #2
When supply voltages permit, a simple circuit can be used, which can also illustrate problems
found in virtually all bandgap circuits. This is shown in Figure 10.4.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE BANDGAP REFERENCE Keith Barr 272

Figure 10.4 Simple bandgap circuit.

The circuitry to the left is often not included the first time a designer gives it a try; the
second time, however, it will be designed in. This is a start-up circuit that ensures the bandgap
circuit always begins operation correctly. The problem is that most bandgap circuits are
bistable; they will find an equilibrium condition, but only once sufficient current is flowing to
begin operation. If the start-up circuit is left out, leakages can often keep the devices
essentially off, and the bandgap output is zero.
M1 is a very long and narrow device that pulls lightly down on node 1, whereupon m3 acts
as a source follower to pull down on node 2. Once the circuit is in operation, m2 turns on to
shut off the startup current through m3. M2 must be a strong device with a short and wide gate
to ensure that once in operation, the start-up circuit is disabled. M4, m5, m6, and m7 constitute
the entire control circuit, and m8 drives a current through R2 and Q3 to produce an output
voltage.
The problem with this circuit is that it has poor supply sensitivity. It can be made to start up
reliably, it is easy to understand and build, it has very few elements, but the lambda effects of
the single devices make the output change when the supply voltage changes. Although,
perhaps adequate for crude devices that need a rough reference voltage, this will not provide a
precise one. It is, however, a good model to use in discussing bandgap issues.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE BANDGAP REFERENCE Keith Barr 273

Figure 10.5 Plot of output vs. supply voltage as bandgap reference starts up.

Referring to Figure 10.5, the start-up circuit seems to have kicked the circuit into action at
maybe 850 mV, but the output isn’t stabilized until a supply voltage of maybe 1.8 V. This is
due to the Vdsat characteristics of the rather long gates employed. The slope of the output with
supply potential is pretty bad, especially considering the excellent temperature stability if the
supply is fixed at 5 V. A temperature sweep at VDD = 5 V is shown in Figure 10.6.
This temperature sweep was conducted with R1 = 60K and R2 = 572K. The ratio of these
values, along with the number of devices used in Q2

Figure 10.6 Temperature affecting the bandgap circuit’s output voltage.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE BANDGAP REFERENCE Keith Barr 274

will determine the temperature coefficient of the entire circuit; a different number of Q2
devices will require a different R2/R1 ratio.
When adjusting resistor values to attain a zero-TC solution, remember that the transistor
voltage component is causing the output to fall with temperature, and the current-through-
resistor component is causing the output to rise with temperature. Keeping this in mind will
allow you to zero in on an acceptable R2 value quickly.
The tempco is ±0.01% from 0°C to 100°C, which looks nice here, but would rarely be
encountered in practice. The bandgap temperature coefficient will always have this parabolic
shape. Adjust R2 to center the peak on the midpoint of your expected die temperature range.
To do so, you will find the value of R2 is very critical. In this case, R2 is 573 KΩ and the peak
is at 45°C. If R2 shifts to 572 KΩ, the peak will occur at 25°C. This represents a 0.175%
change in R2’s value. The same amount of peak shift is found if both resistors change
together by +2.7%. Therefore, expect that over the absolute resistance tolerance of your
process, the zero TC peak will shift considerably, but it is critically important that the
matching of R2 and R1 be considered, through careful resistor layout and placement.
This example is perhaps a poor one, because of its awful supply sensitivity—we may be
overexamining an initially flawed design. Let’s try to improve the basic structure by adding
more gain to the system, perhaps with cascade devices.

Bandgap Design #3
Adding cascade devices to the simple Design #2 should improve the resulting supply voltage
immunity, but it will also complicate the circuit considerably. It will also require a higher
supply voltage to keep the devices all in saturation. Figure 10.7 shows these modifications.
Here, we’ve added a cascade bias generator with m4-m7, for supplying the P cascade
devices m12 and m15. We’ve also used the inefficient technique of developing a single N
cascade bias for m11 with m10. The results are interesting, plotted in Figure 10.8.
The output is flatter, but only from Vdd = 2.9 to 5.0 V. The flatness is still not excellent, the
supply rejection is perhaps −46 dB, and the 2 V supply changes from 3 V to 5 V, resulting in a
10 mV change in output. The temperature coefficient is also plotted in Figure 10.9.
The temperature coefficient is very good, but the bandgap output is not what we expected, it
is perhaps, 1.05 V. Nonetheless, SPICE indicates this is still a reliable voltage source, and
stable over temperature. The lesson here is this: Every bandgap circuit will produce a different
output voltage—the result of slight variations of the circuit elements with temperature. If the
circuit produces a voltage that is temperature-stable

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE BANDGAP REFERENCE Keith Barr 275

Figure 10.7 A badgap reference with cascade devices.

Figure 10.8 Bandgap design #3 output vs. supply.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE BANDGAP REFERENCE Keith Barr 276

Figure 10.9 Temperature response of bandgap design #3.

and predictable, its output does not need to be a particular idealized value. Because the output
voltage is so different from what is expected, we should assume that unexpected temperature
variations within the circuit are not only large, but perhaps variable with process. A thorough
examination may be required, varying device dimensions slightly and analyzing over different
supply voltage conditions to guarantee that the design really is predictable.
The preceding circuit has better supply rejection, but perhaps not enough, and can only be
used with a supply voltage of 3 V or greater. We will try to introduce more controlling gain
into the circuit, and attempt to lower the required supply voltage.

Bandgap Design #4
The circuit of Figure 10.10 really has an amplifier, a diffpair of m7 and m8, designed into
subthreshold for the highest transconductance. These devices are made large to minimize the
1/f noise that will inevitably result at the output from the high gain involved; the gain from
amplifier input terminals to output is about 10. The load devices m9 and m10 are intentionally
designed more into the saturation region with lower transconductance so they can be small but
not contribute noise to the amplifier.
The transistor array is nine devices, which can be arrayed as a 3 × 3 block, with Q1 in the
center for good thermal matching.
The output uses a resistor in series with the Q2 path, so that Q2 can also be used as the
negative temperature coefficient part of the output circuit.
The entire circuit is self-biased, with the aid of the start-up devices m1, m2, and m3. The
use of a PMOS idle current source, m1, allows a lower current for a given device size, which
is a space advantage.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE BANDGAP REFERENCE Keith Barr 277

Figure 10.10 Bandgap design #4 circuit.

The circuit required compensation, and is provided by the 2 pF cap and the 50K
compensation resistor. The output is plotted against supply voltage in Figure 10.11.
This has a remarkably flat output, from about 2 to 5 V. The addition of the high gain
amplifier very much isolates the output from supply

Figure 10.11 Bandgap design #4 output vs. supply.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE BANDGAP REFERENCE Keith Barr 278

Figure 10.12 Zoomed in on the flat section of Figure 10.11.

voltage variations. Also, the output current path is within the circuit’s feedback loop. Figure
10.12 shows a zoomed-in view of the flat section of Figure 10.11.
We see that the output voltage is extremely stable with supply potential.
The temperature plot is shown in Figure 10.13.
Notice that the two resistors that control the circuit current and the output voltage have a
common point. This allows our precision resistors to be grouped together, and also allows us
to adjust the bandgap with a tapped resistor. The layout for the resistor should have several
possible taps brought up through metal contacts so that by shorting taps, the bandgap reference
can be trimmed prior to production tape-out. When the prototypes come in, careful analysis of
the circuit over temperature can allow experimentation by shorting out various taps until a
more correct resistor combination is determined.

Figure10.13 Temperature plot of the circuit of Figure 10.10.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE BANDGAP REFERENCE Keith Barr 279

Figure 10.14 Technique for arraying resistors in a bandgap design.

In Figure 10.14, M1 connects the three schematic points BG, 7, and 8 to poly resistor strips.
If extra contacts are included in the layout, then jumpering can be attempted between them, or
to terminal 8, to modify the resistor values. This can be done by scratching through the upper
nitride layer with tungsten probe pins, or through the focused ion beam (FIB) process (more on
this later). Notice also that the 15K value is embedded into the center of the array, so as to best
match the sheet resistivity of the two composite resistors.
This appears to be a very nice reference, but it has a few problems. The transient response
shows that for an abrupt Vdd change, the amplifier will recover quickly, but this also indicates
that high frequency supply variations will find their way into the bandgap output, as shown in
Figure 10.15.
This is from the supply transitioning from 4.9 to 5.0 V at t = 0 and falling abruptly again at
5 μs to 4.9 V. As we see, the response is well behaved, but the supply rejection at high
frequencies is poor. SPICE simulation also shows that a capacitive load of greater than a few
hundred femtofarads will cause the output to be less stable, causing some ringing in the above
plot. This is a really nice reference, but should be buffered with a simple amplifier, and
probably run through a resistor and bypassed heavily with a MOSCAP to ground before
sending it around the chip. You may also experiment with additional capacitors

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE BANDGAP REFERENCE Keith Barr 280

Figure 10.15 Transient response of bandgap output to abrupt supply variations.

within the circuit to help improve the transient response and the supply sensitivity at high
frequencies.
Never expect to actually draw current from such a signal; it’s not a power supply, just a
stable reference potential.

The Half Bandgap


One problem with analog design today is the falling supply potentials that cause bandgap
references to be virtually impossible. How do you generate a 1.2 V reference from a 1 V
supply?
This circuit uses devices deep in subthreshold with short, wide gates; it is called a half-
bandgap reference, and is shown in Figure 10.16.
The earlier bandgap references added the voltage drop of a transistor base-emitter junction
to the voltage drop across a resistor to produce the

Figure 10.16 The half-bandgap reference.


Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).
Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE BANDGAP REFERENCE Keith Barr 281

Figure 10.17 Startup of the half-bandgap reference.

bandgap output. This reference determines the two components individually, and then
averages them. The output is half the voltage of the full bandgap reference, but allows the
circuit to operate at below a volt.
The circuit uses devices from a 0.6-μ process, and would work to a lower voltage with
devices that have lower threshold voltages, as would be expected in a low voltage logic
process.
M4 and M5 are biased with a simple common source resistance. The summation of the two
outputs is easily accomplished with an extra resistor, R3.
The output versus supply potential is plotted in Figure 10.17.
Designing the devices to work more deeply into subthreshold will allow yet lower Vdd
operation, at the expense of size (wide gates) or output variation with supply (short gates). The
temperature characteristic is shown in Figure 10.18.

Figure 10.18 Temperature stability of the half-bandgap reference.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE BANDGAP REFERENCE Keith Barr 282

A Bandgap Supply Regulator


Sometimes a really great bandgap reference isn’t required, but a stable supply is, and usually
not at 1.2 V. If we attempt to use simple techniques to create, say, a 3.3 V regulated supply
from a 5 V raw supply, we can use the regulated output to drive the reference circuitry, so that
issues of supply sensitivity can be reduced. This is shown in Figure 10.19.
Please accept my apology for the complexity. I’ll try to explain, as there are many system
issues at play here. On the left, we have an amplifier that compares the output of the bandgap
reference (on the right) with the tap on a voltage divider at R3 and R4, across the circuit
output. M7 is a large series pass transistor that conducts current from the 5 V supply to the
output, and is driven by the amplifier. In the middle, we have star-tup circuitry that detects a
reasonable signal level within the bandgap and if not sufficient, turns on both the pass
transistor and the bandgap internal current sources. Notice the lack of precision in the device
dimensioning; many times, just about any devices will work fine, only trading one issue for
another. In this case, I just threw in some reasonable values, and they seem to work well.
The objective of the circuit is to supply an internal 3.3 V source that is reasonably well
regulated. Such a supply may be used for 3.3 V logic circuits, as in a 0.35-μ process with a
thick gate oxide option that allows this circuit to run from 5 V.
The output when the supply is swept from 0 to 5 V is shown in Figure 10.20.

Figure 10.19 Supply regulator.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE BANDGAP REFERENCE Keith Barr 283

Figure 10.20 Startup of the bandgap supply regulator.

This is a plot of the output while loaded by a 100 ohm resistor, driving 33 ma into a load.
The output is very stable with supply voltage, since the bandgap reference is powered by the
output. The body connections to the PMOS devices within the bandgap are drawn to the
output node, indicating a separate well for these devices. The temperature characteristic is
shown in Figure 10.21.
The output is stable, with small output capacitances attached, up to a few nanofarads, but
will become increasingly unstable when loaded with large bypass capacitors. If this is to be
used as a supply for low voltage logic, I strongly suggest that the output be brought out to a
pin where an external bypass capacitor can be attached, and the compensation

Figure 10.21 Temperature sensitivity of the supply bandgap regulator.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE BANDGAP REFERENCE Keith Barr 284

components within the circuit will then need to be adjusted for the expected capacitive load.

A Temperature Sensor
The two components of a bandgap reference are almost perfectly opposed to each other as
temperature changes. If we take the difference between the two terms, we should have a good
temperature sensor.
We will need to use an amplifier to produce an output that swings from rail to rail as the
temperature varies. The schematic is shown in Figure 10.22.
The feedback resistor R3 is very large. Increasing the currents throughout can lower the
required resistor values. The general output characteristic depends on R3, which sets the scale
of the system, and R2, which sets the offset. The circuit is expected to run from a fairly stable
supply potential, and has an output versus temperature characteristic, as shown in Figure
10.23.
The temperature characteristic is fairly stable with expected process variations. A 10% shift
in sheet resistivity value leads to a 1°C temperature error. This is not a precision
thermometer, but the output is quite linear; only a few bits of calibration information would be
required to

Figure 10.22 A temperature sensor.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE BANDGAP REFERENCE Keith Barr 285

Figure 10.23 Output of temperature sensor from 0°C to 100°C.

adjust the output to very high accuracy. One can imagine adding circuitry to the temperature
sensor to also provide local supply regulation, providing immunity from supply variations.
This temperature sensor is a perfect example of how knowledge of the available elements,
which are really quite simple, can lead to a synthesis of novel devices.
This is sandbox “thinking.”

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
THE BANDGAP REFERENCE Keith Barr 286

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
Source : ASIC Design in the Silicon Sandbox Keith Barr 287

Oscillators, Phase-Locked Loops, and RF


Introduction
The internal component tolerances of CMOS devices are variable from lot to lot to such
an extent that only crude (±30%), on-chip free running RC oscillators can be built,
although provided sufficient supply current and space, LC oscillators can be built to 5%
tolerances, or better. Due to the small inductance values that can be achieved in a
reasonable space, these oscillators will run at a high frequency and draw considerable
current. For reasonable precision, an external crystal oscillator reference is required,
which can then be “kicked up” to any frequency desired through the use of a phase-
locked loop (PLL).
The following can serve as both ideas to consider incorporating into a design, as well
as further examples of analog circuits for the purpose of discussion.

LC Oscillators
A spiral inductor can be fabricated from the metal layers, but inductance is low and
resistive losses are high. Since the Q of an inductor is the ratio of its reactance to its series
resistance, operating such oscillators at high frequencies provides higher Q values. Spiral
inductors built on silicon suffer from additional losses due to the resistive substrate that
further loads the resonator; the magnetic field from the inductor penetrates the substrate
and generates potentials that suffer resistive losses there. One method of improving the
LC oscillator is to build the spiral inductor in the top metal layer (which is usually thicker
and of higher conductivity). The bottom metal layer is used to bring out the center
connection.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 288

Simple oscillators like this must run at multi-GHz frequencies to obtain a high Q,
which may at best be only 10. The output, however, may be the most accurate and
predictable frequency for a free running oscillator, as the metal patterning is quite
accurate, and although capacitances will vary with process, often by ±10%, the resulting
frequency tolerance will be the square root of the capacitance tolerance, since

RC Oscillators

The period of oscillation being approximately 2.2 × R × C. This oscillator will vary
significantly in frequency due to process resistance and capacitance tolerances, and
should be expected to wander by as much as ±30% from lot to lot. This may, however,
be acceptable for some purposes. The range of frequencies is broad, easily from a few
kHz to several hundred MHz.
If better frequency control is required, an external resistor can be connected through a
single pin to set frequency, as the capacitor values are usually controlled to be within
10% of nominal in a good CMOS process. A schematic for such an oscillator is shown in
Figure 11.1. The term PAD is meant to represent a fully protected bonding pad.
The transient response output, plotting C1’s potential, is shown in Figure 11.2.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 289

Figure 11.1 An oscillator that runs at a frequency determined by an internal capacitor and an
external resistor.

The oscillator is running at about 8.5 MHz, with a 100 KΩ external control resistor. The
output pulse width is about 5 ns, evidenced by the period during which C1’s voltage is
approx 0 V. The circuit draws about 100 μA.

Figure 11.2 The transient output of the circuit shown in Figure 11.1.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 290

The input resistance is compared to an on-chip bandgap reference to set a proportional


current through m2 and m7. C2 stabilizes the current reference in the case of pin capacitance,
which could be several pico-farads. The capacitor C1 charges until the second amplifier senses
that the cap voltage has exceeded the bandgap reference potential. M8 is then turned on,
discharging the cap completely before the amplifier can propagate its signal through the
inverter chain. The first inverter in the chain is designed with long gates to load the amplifier
and, therefore, control the output pulse width.
Considerations: The inverter chain delay and the amplifier response driving the inverters
will control output pulse width, which must not be too short, or C1 will not become
completely discharged. Also, at this frequency, C1 is charging to a potential substantially
above the reference before the amplifier can propagate a “reset” signal to m8. The
frequency will not, therefore, be accurately inverse to the external resistor value, as extra time
is taken during each cycle for the overvoltage and full reset process. We can make the pulse
output intentionally short, but if it’s too short, we will have difficulty clocking circuits with
it. Also, the gate of m10 will draw sharp transient currents from the bandgap reference when
the circuit switches. A series resistor and a bypass MOSCAP at the gate of m10 may be in
order to keep the bandgap reference line clean.
A square wave oscillator with external resistor control is shown in Figure 11.3.
The oscillator of Figure 11.3 does not require a bandgap reference; it sets the voltage across
the external resistor to the midpoint of the voltage divider on the far right. The current through
the circuit will then depend on the external resistor and the supply voltage. The divider also
provides positive and negative trigger levels to two comparators, so that as supply voltages
vary, the charging and discharging of C1 is to voltage limits that are also a function of supply
voltage; the output frequency is, therefore, rather independent of supply voltage.
M11 and M12 switch the current mirror devices m9 and m10, depending on the output state.
When node 9 exceeds a threshold set by the resistor divider, one of the comparators affects
M23 or M24 to “flip” the currents charging the capacitor to opposite polarities. The output
inverter INV2 is strong, to provide a good output drive, while INV1 is weak, to hold the last
condition.
Oscillators such as these, which trigger from defined potentials, are susceptible to transients
that could cause it to flip prematurely. The oscillator must be protected from high noise digital
environments. M25 and M26 act as bypass capacitors so that the effect of supply glitches is
minimized.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 291

Figure 11.3 Improved oscillator that relies on an internal capacitor and an external resistor.

This oscillator, with a 1 pF cap for C1 runs at about 40 MHz with an external resistor of 10
K Ω, and draws about 1.6 mA. When the external resistor is 100K, the oscillator runs at 4
MHz and draws about 200 μA.
Plotting the C1 node and the output:

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 292

Crystal Oscillators
Crystals must be used as external devices when greater frequency accuracy is needed. The
classic crystal oscillator circuit is illustrated below.

Crystals are piezoelectric (electromechanical) devices that resonate at a frequency


determined by their physical dimensions. The effective Q of a quartz crystal is extreme—on
the order of 50,000. The effective circuit for SPICE modeling would be.

For a 12-MHz crystal, typical values would be CX = 20 fF, LX = 8.8 mH, RX = 15 Ω; these
are absurd numbers from an electronic point of view. They are simply “motional”
equivalent values as a mechanical resonance/piezoelectric model. CP is the package
capacitance and may be in the order of 4 pF. When such numbers are used to simulate an
oscillator in SPICE, the tremendous Q of the circuit makes it difficult to start the oscillator, in
which case I suggest a starting current through the inductor portion.
LX 1 2 8.8e−3
Istart 1 2 pwl 0 1m 1n 0

This will start the oscillator with an initial 1 mA of current through the inductor. If you run
the simulation long enough, with the options turned up, you may be able to see if the
amplitude is increasing or decreasing. Decreasing amplitude means the starting current is too
high or the system has insufficient gain to function.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 293

The termination capacitors C1 and C2 are much larger than the internal equivalent
capacitance of the crystal resonator, which implies that the currents through the crystal are
large, on the order of a milliamp, which will cause the “potential” at the junction of the L
and the C of the crystal to reach a peak of several thousands of volts. During operation the
signals at each end of the crystal will be 180° out of phase, and swing between supply and
ground.
The oscillator requires R1, which biases the inverter into its linear amplifying range, and R2
limits the high frequency content of the inverter output from exciting the crystal at an overtone
frequency.
The output of the inverter is a well-clipped sine wave, approaching a square wave. As a
square wave is the summation of a fundamental and its odd harmonics, it is rich in the 3rd,
5th, 7th, and higher harmonics. R2 acts against C2 to reduce the 3rd and higher harmonics. If
this feature is not incorporated into the design, the crystal can easily begin operation at a
higher “mode.” C1 and C2 are recommended by the crystal manufacturer, and R2 is set to
roughly equal C2’s reactance at the operation frequency.
When building these oscillators, it is advised to make the amplifier a robust inverter, and
send the output through a resistor to the crystal pin. Also, the capacitors can easily be
integrated into the IC, so that external components are not required. Not shown are the
obvious protection devices that will exhibit capacitance as well, and should be taken into
account when dimensioning the capacitor elements.

M1, m2, m5, and m6 are proportioned to function as the load capacitors for the crystal. R1
biases the amplifier, and is typically 500K to 1 MΩ for high frequency (>1 MHz) crystals. R2
reacts with the output MOSCAPs to attenuate the third harmonic, typically set so that the

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 294

response between R2 and the output capacitance is 3 dB down at the operation frequency, on
the order of 500 Ω for a 12-MHz oscillator. This allows a clean logic output, smooth
fundamental operation, and no external parts, except for the crystal.
Crystals come in basically two types: ones that operate in thickness mode, typically several
hundred kHz and above; and lower frequency crystals that operate in length mode. Be very
careful about designing with low-frequency thickness mode crystals, as a 1-MHz crystal is
quite expensive, simply on account of the material used. Only once the frequency is above
about 4 MHz does the price come down. Length mode crystals are most commonly used in
watches, at precisely 32,768 Hz. They are produced in enormous quantity and are stunningly
cheap. A 32,768-Hz crystal can cost under 10 cents in quantity and be very accurate and
stable, but only at room temperature. The higher frequency AT cut crystals must be used for
frequency stability over a wide temperature range. A 15-stage ripple counter, which can divide
32,768 Hz down to 1 Hz, is used for ticking the second hand in a quartz watch.
Length mode crystals, however, are extremely sensitive. They cannot be driven casually,
with the kind of circuits that would be appropriate for higher frequencies; if overdriven, they
actually mechanically “break.” If you slap a 32 kHz watch crystal across a biased inverter,
it will become permanently damaged within a fraction of a second. When starting them in a
SPICE simulation, use a current of a microamp or so across the mechanical inductance. A
watch crystal model would be LX = 11,800 H, CX = 2 fF, RX = 50 KΩ, and CP = 2 pF. The
drive level must be kept below 1 μW, which is calculated by the current that must exist to
produce the voltage across C1 and C2, and the internal resistance, P = I^2 × R.
The load capacitors recommended for watch crystals usually call for C1 and C2 to be on the
order of 10 pF. At this frequency, the reactance is about 500 KΩ. An appropriate series
resistor would then be this value. The bias resistor, R1 would best be in the order of 10 MΩ.
This calls for very high sheet resistivity in the selected process. The inverter within the watch
crystal oscillator can be made extremely weak, with the entire circuit consuming only a few
microwatts.
Because of the low frequency involved and the enormous Q value, the watch crystal takes a
second or so to come up to operating signal swings. High-frequency crystals can take several
milliseconds to “come up.” This must be considered whenever using any crystal oscillator.
All crystal oscillators must be followed by a Schmitt trigger so that high frequency noise
on-chip does not cause double pulses as the oscillator output swings through mid-supply.
Further, in the case of a watch crystal with an AC peak terminal voltage of 1V, the rate of
change at either terminal is only about 200,000 V/s, or only 200 mV/ns. The oscillator

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 295

output, if used to drive a PLL, can have a jittery output in the presence of on-chip noise. One
solution is to differentially amplify the two crystal pad potentials with a simple amplifier to
increase the rate of rise/fall, and then send the output through a Schmitt trigger.

X1 and X2 connect to the pads of the IC’s crystal terminals. It is suggested that a watch
crystal circuit be built between two ground pads, with identical features at each pad so that any
common mode substrate or power interference will be removed by the differential amplifier.

Phase-Locked Loops
The watch crystal frequency can be multiplied several thousand times by the use of a phase-
locked loop. In fact, no matter what the clock source, PLLs can be used to produce clocking
frequencies on-chip that would be problematic if sent off-chip to an external device. The low
internal capacitances on-chip versus the rather high capacitances of pad structures and PCB
traces make internal clock signals in the several hundred MHz range quite convenient and
provide a low power solution to timing problems. The power required to produce a 50-MHz
clock that actually appears on an IC pin, and drives just 1 in. of PCB trace to another single IC
input could be several milliamps, whereas if kept on-chip, the current requirement would be in
the 100 μA range. Very high clock frequencies can be used internally without causing severe
RFI emission; it is suggested that you seriously consider reasonably high

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 296

internal processing clocks, and make every effort to keep the I/O signals at a low rate, simply
for RFI reduction and current consumption purposes.
The PLL structure is that of a voltage-controlled oscillator (VCO), a counter (frequency
divider), a phase comparator, and a loop filter.

The phase comparator for this purpose is the dual flip-flop phase/ frequency detector
illustrated in Chapter 8. The counter determines the ratio of output frequency to input
frequency. The filter is required to stabilize the loop.
The inputs to the phase comparator will be at the input frequency. The counter output will
be an effective divider of the VCO output, so the VCO will produce a frequency that is the
number of counter states times the input frequency. The phase comparator will act on the
rising edges of its input signals, and can therefore only output a controlling signal in the brief
period of time between rising edges that are displaced in time (shifted in phase). The filter is
designed to smooth this pulse-like phase detector output, so that the control to the VCO is as
smooth and continuous as possible, leading to a uniform, jitter-free VCO output. Once locked,
the output of the phase detector will be very brief pulses, if any at all.
Unfortunately, this implies that the filter has to be a low-pass averaging filter, or an
integrator, which actually leads to loop instability. The VCO input is a frequency control, but
the phase comparator only responds to phase. Since phase is the integral of frequency, the
VCO can be seen as an integrator in terms of phase; if the frequency is off by a bit, the phase
error to a reference frequency will continually increase. The insertion of a pure low-pass filter
into the loop will produce a control loop that effectively contains two integrators, which we
can predict to be an unstable condition. The loop filter must contain an additional
characteristic to cause stability by actually passing higher frequency components, while also
providing the average of the phase detector’s output.
It is important to understand this problem intuitively, so allow me to explain the interaction
of the PLL components in greater detail. Let’s imagine that the VCO frequency is correct,
but the output of the counter is not in exact phase alignment with the input signal. The phase
detector

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 297

will output pulses to correct this through the filter, but correction can only be accomplished by
changing the frequency of the VCO, which was initially correct. The VCO frequency will be
slowly changed by the filter output until phase alignment is achieved, but at this point, the
VCO frequency is no longer correct. The phase of the counter output will then continue on to
produce an opposite phase error, increasing in magnitude with every counter output pulse. The
phase detector will then act through the filter to correct this new error, but can only do so
while the phase error is sufficient to produce a phase detector output. The system will
constantly oscillate—when the frequency is correct, the phase will not be, and when the phase
is correct, the frequency will not be.
Stability can be achieved by “bumping” the oscillator during every phase comparison so
that as the VCO’s frequency is slightly affected, its phase is immediately affected. The
schematic shown so frequently in texts (the essence having been lifted from a Motorola
CD4046 data sheet) is shown in Figure 11.4.
FF1 is clocked through INA from the external frequency source; FF2 is clocked through
INB from the counter output. Both flops have their D inputs held at supply, so they will
independently set on rising signal inputs. If both flops become set, the AND gate will
immediately reset them. If either becomes set, the OR gate enables the TS buffer to output
either high or low to the loop filter R1, R2, and C to produce a VCO control signal at VC. If
the phase of INB lags INA, then FF1 will set, the TSBUF will output high, and when the
rising edge of INB comes along, the TSBUF will be turned off as both flops are reset. Under
these conditions, positive pulses will occur at the TSBUF output, pulling R1 to

Figure 11.4 Classic frequency/phase comparator, with tristate output.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 298

VDD briefly. When TSBUF turns off, its output will settle at the voltage stored on C. In this
case, the VCO would be designed to increase its operation frequency with a more positive
control signal.
Although the capacitor C will hold a control voltage to drive the VCO continuously, the
series resistor R2 allows a brief extra VC potential while the phase comparator output is
active; this adjusts the phase by over-controlling the VCO temporarily. In effect, the phase is
being corrected at each phase error pulse so that the VCO frequency can be only slightly
varied while the loop is being brought into phase lock. In principle, this works just fine. In
practice, operating at low frequencies, it is flawed.
The problem with this arrangement is that when the system is in lock, the TS buffer will
emit extremely brief pulses, indicating very slight phase errors, which are to be expected; the
phase will rarely be exact. When this happens, due to stray capacitance, the output terminal of
the TS buffer will not immediately return to C1’s value. As the TS buffer aggressively forces
it’s output to VDD or ground (depending on the phase error), it’s output capacitance will
be charged and the additional charge held by that capacitance will constitute a correction
signal that is greater than a tiny phase error would normally produce. If R1’s value is high,
as would be expected in an on-chip PLL with a small C value, the time constant between R1
and the stray capacitance at the TS buffer output (which includes capacitance from R1 to
substrate) could be in the order of 10s of nanoseconds. A 1 ns correction pulse will then have
the effect of a 10 ns pulse, which leads to instability on a short time basis; the resulting PLL
will demonstrate a jitter component. This effect can be minimized by making the drive
transistors in the TS buffer only large enough to pull sufficiently on R1—certainly not the
aggressive devices that would be used in a standard cell. Also, split the positive from the
negative paths (raw flip-flop outputs) so that both affect the filter simultaneously, prior to
reset, and increase the delay through the AND gate so that both paths turn on in the case of a
very short correction pulse. These issues require close SPICE inspection.
A more common solution is to replace the TS buffer and the high-valued resistor R1 with
switched current sources, often called a charge pump; a term, I feel, that is somewhat
misleading. It would be more aptly termed a switched current source or a charge gate.
Voltage doubler circuits, which would more appropriately be called charge pumps, are
described in a later chapter.
For low input frequency PLLs, such as one driven by the watch crystal oscillator, the time
constant of R1 and C must be large, as the period between phase correction pulses is relatively
long. With on-chip capacitors being limited to the under 100 pF range, R1 becomes
prohibitively large. One solution is to place the loop filter components external to the IC,
allowing a reasonable R1 value to exist inside the chip, with R2 and

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 299

a large C value off-chip. The unfortunate consequence of this is that the ground external to the
chip is not the same as ground on-chip, due to sharp supply currents through the inductance
and resistance of the supply and ground bonding wires. If such an approach is taken, an
additional RC filter should be imposed between the external network pin and the VCO control
input. When the system uses a switched current source to replace the tristate gate of Figure
11.4, then the large-valued resistor R1 can be removed. R2 will exhibit a small voltage drop
during phase corrections, which brings about stability.
The entire filter can be fully integrated if precautions are taken, but, first, let’s look at the
VCO part to gain an understanding of where VCO design issues might offer alternative
opportunities.
The simplest VCO is the ring oscillator.

This is three inverters in a loop, with an output amplifier and a Schmitt trigger. The ring
may be any odd number of inverters in a loop, I’ve used three here for simplicity, but 5 and 7
stage rings are common. The terminal RINGTOP is connected to the PMOS devices and the N
well in which they are built. Varying the potential on RINGTOP will adjust the frequency of
the loop, and the current that the loop draws. Often MOSCAPs that are similar in size to the
inverter transistors are added at each inverter input to act as an additional load to the previous
stage, lowering the oscillation frequency and limiting the extent to which terminals are driven
above the RINGTOP supply or below ground. This can be an important feature, which I will
explain better in detail later. The output is derived through an amplifier to bring the internal
ring signals up to full supply logic levels.
The frequency of the ring for a given applied voltage will vary inversely with the square
root of the gate length; as the gate length is doubled, the load capacitance on the previous
stage is doubled, and the

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 300

Figure 11.5 Ring oscillator frequency vs. control voltage.

current drive to the next stage is cut in half. A frequency versus RINGTOP voltage plot for 1-
μ gate lengths (with 1μ × 4μ MOSCAPs attached) is shown in Figure 11.5.
The frequency output covers a considerable range. If we realize that the ring oscillator also
can be controlled by a current into the RINGTOP terminal, we can imagine driving this node
with a very long and wide PMOS device that becomes both a low noise current driver and a
storage capacitor.
In the schematic of Figure 11.6, m15 is the large PMOS device that controls current into the
ring oscillator, controlling its frequency. In this

Figure 11.6 Schematic of control circuitry for a phase-locked loop using the ring VCO.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 301

case, the gate of the PMOS device, labeled VC, is the control voltage, which will decrease in
potential to increase the output frequency.
In this circuit, there are two separate control mechanisms: the first affects frequency by
injecting brief current pulses into m15’s gate, affecting frequency; the other briefly adds or
subtracts current directly at RINGTOP to affect phase. M1 accepts a bias potential that
establishes a current through m2 and a matching current through m4, m5, and m8 then can
have their currents gated into VC by the control terminals from the phase detecting flip-flops.
This is the entire mechanism for frequency control.
Phase control is established by deriving a current that is proportional to that which is sent
into RINGTOP by m9 and m10. M11 can then subtract current from the RINGTOP terminal,
or m14 can add to it. M12 and m13 are also driven by the flip-flop outputs to enable these
potential current sources.
This is probably the best method for PLL control on-chip, especially when operating at low
phase comparison frequencies, and requires no external components. The phase control
currents through m11 and m14 should be approximately 50% to 80% of the frequency setting
current into RINGTOP for the fastest settling time. Stability is then controlled by the currents
that affect VC. If these currents are too high, the loop will tend toward instability; if too small,
the loop will be overdamped, and will take a long time to settle. Further, overdamped PLLs do
not control the natural jitter of the VCO, and will demonstrate poor control over output phase
error.
The correct values for these currents are best determined by experimentation and SPICE
simulation; too many variables are involved to arrive at a workable solution through individual
analysis of the VCO or filter currents. However, certain rules do apply—the large PMOS
device, m15, will determine the change in VC required for a given change in frequency, and if
operated toward the subthreshold region, the freq/VC slope will be steep, leading to much
smaller allowable VC control currents. Operate m15 as deep into saturation as possible.
Finally, as with all fixed filters, the range of operation will be limited; that is, the PLL
should be designed around a fairly limited range of frequencies. The range can be extended
though, through the use of a frequency-to-current converter that increases the bias currents
(increasing VB) depending on operating frequency. Such a converter can be fed with the PLL
input, and the reference voltage VB can be fed to the PLL. Without this feature, the PLL will
go unstable when operated at frequencies below the design range. The frequency to bias
potential converter can significantly extend the operating range of the PLL. If you’re getting
to this point with your designs though, you’re digging deep into the sandbox!

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 302

The inverters and gates provide nonoverlapping pulses to m1 and m2 to alternately short-
out C1 and connect the discharged value to m3. The average current transferred will be V × F
× C, where V is VDD minus m3’s Vgs potential. M3 is a large area PMOS device that acts
as a load as well as a filter capacitor. Rf and m4 further filter the resulting control signal, and
m4 ultimately conducts a current to m5, which produces a bias voltage for the PLL.

PLL Precautions
PLLs are difficult to simulate, because of the long time required for many VCO cycles to
show the settling of VC. For this reason alone, the counter should be intentionally short. If a
high multiplication ratio is desired, as in kicking a 32-KHz watch crystal up to 134 MHz
(4096:1), do the conversion in two stages of 64 each, or three stages of 16 each. Simulation
times will be reasonable as a result. Further, large multiplication ratios will lead to greater
jitter in the VCO output, for in the previous case, 4096 VCO cycles will elapse before phase
control can be applied. PLLs with smaller dividers generally result in tighter phase control.
The raw phase noise of the VCO when measured without phase locking control will exhibit
a large 1/f component, as would be expected using small MOSFETs. Further, even at high
frequencies, thermal noise can corrupt the free-running VCO frequency (and hence, phase).
The universal technique to reducing noise is to increase the current through

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 303

the VCO. Designs with large gate areas are required at low frequencies (up to 1 MHz) to lower
1/f noise, which naturally leads to increased currents, and at high frequencies, the use of wide
devices (also increasing supply current) will lower thermal noise. When a large multiplication
ratio is needed, it is helpful to split the PLL into two or more cascaded sections for simulation
practicality, but also because each VCO will have its own specific needs in terms of device
sizes to control noise.
PLLs with a high multiplication factor can be affected by transients generated from logic
circuitry on the chip. One mechanism has to do with the signals within the ring oscillator
being driven by capacitive effects below ground. The substrate potential is rarely quiet in a
mixed signal design; excursions of junctions within the oscillator, below ground, will be
affected by substrate transients at those moments, slightly “nudging” the phase of the
oscillator in between phase comparisons. Such PLLs will show jitter that can be reduced by
MOSCAPs within the VCO ring, thereby reducing signal excursions. A single ring oscillator
stage with internal signal swing limiting bypass capacitors is shown here.

When simulating the components of a PLL, you must look at every possible starting
condition, and make sure that the complete circuit will come into a stable condition. For
example, if the maximum frequency of the VCO is beyond the capability of the counter, you
can bet that at some point, the PLL will output that frequency, when the phase comparator sees
nothing coming from the loop counter, ensuring the condition. Also, if the input frequency is
brought into the IC with a standard inverter in the pad structure, you can also bet that the input
signal will glitch due to internal noise, forcing the PLL to act on each edge of the input signal.
Schmitt triggers are mandatory, at the input to the IC (if the clock source is external) and at the
output of the VCO.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 304

RF Local Oscillators and Predividers


For RF applications, special techniques must be used to generate voltage-controlled
frequencies with high spectral purity. Further, predividers must be used to reduce the very
high oscillator frequency to one that is manageable by standard cell logic circuits so that
frequency control can be established with a PLL. Differential oscillator operation provides
perfectly opposed signals as clock control to a predivider, and the high energy storage
capability of the LC-tuned circuit provides spectral purity at relatively low power
consumption.
The ring oscillator is limited to an odd number of stages, so perfectly opposing clock
outputs are not possible, and the phase noise of the ring oscillator is unattractive at reasonable
supply currents. Differential ring oscillators have been proposed, which can be produced with
an even number of stages, but their performance does not compare to the LC-tuned circuit. LC
oscillators suffer from restriction in voltage tunability, so designs may need some
experimentation before optimum operation is achieved.
LC oscillators require a fairly large area for inductors, and such components are still only
capable of Q values on the order of 6 to 10. For differential oscillators, an inductor layout
using three metal layers is shown below.

M3 is used as the inductor layer, because of its thickness and lower resistance, as well as its
greater distance from the substrate. Vias connect the coil center through M2 to the substrate
shield drawn in M1. The substrate shield is called a Faraday shield; it shields the coil from the
substrate capacitively, but does not act as a “shorted turn,” which would decrease the coil’
s inductance and quality. Magnetic fields still penetrate the substrate and constitute a loss
factor, but capacitive coupling to the resistive substrate can be responsible for additional loss
factors. The above inductor, if drawn to be about 200 μ on a side, would have an inductance of
about 3.2 nH and a reactance of about 50 Ω at 2.5 GHz.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 305

If the M3 layer is thick, the resistance would be about 10 Ω, leading to a Q of about 5. The
skin effect, which would normally dominate the resistance at such frequencies, does not come
into play, as the metal layers are so thin. If the inductor is placed on the very topmost metal
level, the use of a Faraday shield may not be required.
The larger the inductor, the greater its magnetic penetration into the substrate, so small
inductors lead to higher Q values, but lower inductance. The use of narrow metal traces
increases series resistance, but the use of wide traces leads to eddy current losses. The design
of good spiral inductors is very much experimental.
A differential oscillator is illustrated below.

The amplitude of oscillation is set by the current through the source resistor, which supplies
limited power with which to overcome the losses in the LC circuit. The PMOS devices are
built within a well that becomes the voltage control for the circuit. The N well material below
the PMOS gates is depleted of carriers immediately under the gate depending on well
potential. This feature is actually drawn as PMOS devices with N-type source and drain
regions, as the gate capacitance to well is our only concern, and good, low-resistance
connections to both gate and N well are very important to ensure low variable capacitor loss.
In this circuit, the PMOS device is used in its accumulation mode, with the gate more positive
than the underlying N-type substrate.
The tuning range is quite limited. As a result, the inductor values must be carefully adjusted
by trial and error until the VCO is within the desired range. The plot shown is from a SPICE
simulation, which is probably off significantly, as SPICE models do not fully detail losses in
such PMOS devices in accumulation mode.
LC oscillators utilize energy storage within the tank circuit to set the operation frequency,
which can be much higher than that obtained by

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 306

MOS devices working against each other or through resistances. We must use MOS circuits,
however, to divide the oscillator frequency down for PLL locking, which is difficult in the
GHz range, unless a high cost, short gate process is used. When prototypes must be built to
“zero in” on a good design, fabricating in a process that can be prototyped inexpensively
provides an economic advantage.
The predivider can be built with resistor-load logic, which uses resistor loads in
conjunction with NMOS devices to produce a half-rate differential output. Since the signals
swing over a small voltage range, the currents required to charge capacitances can be smaller,
and the circuits can run faster, as shown in Figures 11.7 and 11.8.
This circuit of Figure 11.7 requires well-aligned complementary clocks, which naturally
come from the differential oscillator as low level sine waves. The output signals are sufficient
to drive the next stage, with stages cascaded until the frequency is low enough for standard
cell circuits to handle. The divider must be carefully simulated over all possible conditions;
signal drive level, resistor tolerances, and so forth.

Figure 11.7 Schematic of high speed predivider using resistive loads and minimal signal
swing internally.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 307

Figure 11.8 Output of predivider shown in Figure 11.7.

The source followers m1 and m3 buffer the differential VCO outputs and drive a differential
pair m5 and m6. When OSC+ is high, m5 turns on and delivers current to m10 and m11,
which are cross-coupled to hold the signal that previously existed on nodes 6 and 7, while m11
and m12 conduct this signal to the next stage. When OSC− is high, m14 and m15 hold this
condition while m8 and m9 carry this signal in the opposite phase, back to the fist stage.
The current consumption can be made low, simply by raising resistor values and making the
devices smaller, provided the output is sufficient to drive the next stage. Once at a lower
frequency, a simple amplifier can be used to regain full logic levels. Figure 11.8 is a plot of
nodes 1 and 2, and the outputs DIV+ and DIV−.
What I find most interesting is that devices can be built in inexpensive 0.5- and 0.6-μ
processes to operate in the popular 2.4 GHz band. The apparent limit for 0.35-μ processes
using these techniques is around 5 GHz.
The design of high-frequency, voltage-controlled, low-phase noise oscillators with on-chip
inductors is still an experimental art. SPICE can help, but only with losses added manually to
the components; for example, SPICE cannot determine the eddy current loss within the
substrate beneath an inductor, or the resistive losses in an accumulation mode PMOS device
used as a voltage variable capacitor. Experimentation is required to find the best structural and
layout geometries.
Fortunately, even high frequency circuits such as this can be prototyped in 0.6 μ CMOS
through MOSIS at low cost (approx $5000). Many test oscillators, complete with output
dividers, can be fabricated onto a single prototype multiplexed with selection pins to apply
power and

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 308

select low frequency outputs for design characterization. More modern processes, with many
metal layers can be used (at considerable additional expense) to provide more conductive
multilayer inductors with greater spacing to substrate and higher Q values.
Experimentation is cheap in the sandbox—a single “shotgun” approach to gathering data
is an inexpensive prototype away, only taking time that can be best spent thinking about what
to do next.

Quad Circuits and Mixers


When working with RF, the signal and local oscillator frequencies often occupy a very
restricted band of possible frequencies, and although tuning is required for PLL locking to an
exact frequency, most circuits can work with fixed values. A very handy result of differential
signal handling is that quadrature signals can be easily developed, with reasonable phase
accuracy. A quadrature signal is one that is displaced 90° from an original signal and can be
used to advantage when down-converting a band of very high frequencies to a lower range of
frequencies, which can be handled by more traditional circuits. This is the case with receivers,
but transmitters also can benefit in up-converting lower frequencies to a high frequency
specific broadcast band.
The mixing of two signals produces sum and difference frequencies. Usually, only one of
the two is desired, but a simple mixer delivers both, as shown in Figure 11.9.

Figure 11.9 Simple mixer.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 309

In a receiver, the mixer produces an intermediate frequency (IF) output that can be
amplified with more traditional amplification stages, without causing interference back to the
receiver’s input. The difficulty, however, is that an image frequency is also received, as the
RF input section that drives INP and INN can only reject nearby frequencies to a limited
extent.

It is found that by mixing the received RF signal with the local oscillator to produce a first
IF signal, and also mixing a 90° phase-shifted version of the RF signal with a 90° phase-
shifted version of the local oscillator to produce a second IF signal, each IF signal will contain
the desired signal frequency and the image frequency. However, adding these two signals will
cancel one component, and subtracting them will remove the other. This is a very handy
technique, but requires precise 90° signal relationships to be developed at the RF input
frequency and at the local oscillator frequency.

If our signals are all differential, 90° phase shifting is conveniently prepared from any
differential signal through a polyphase circuit.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 310

The 0° and 180° signals are brought out through R and C components so that loading on
the filter output will be common to all output phases. R is chosen to equal C’s reactance at
the frequency of operation.
One of the more clever ideas in modern radio communication is BPSK signal modulation,
with the receiver locking onto the incoming carrier signal with a Costas loop. In binary phase
shift keying, the carrier is modulated by a binary input signal to invert the carrier waveform
during the expression of 1s, while allowing the carrier to pass noninverted during zeros. The
decoding of the carrier can be developed by multiplying (mixing) the incoming signal by an
internally generated carrier, delivering a positive output when the local carrier is in phase with
the signal, and a negative output when out of phase. To control the local oscillator for this
decoding requires a second demodulation path, again at a 90° phase shift.

This is elegant beyond the norm. When the oscillator is in correct phase alignment with the
incoming signal, the Q output will be zero, because the carrier will be alternating (according to
data) between −90°

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 311

and +90° from the shifted oscillator value, resulting in zero output in either case. The I
output will carry the information, as it is the direct product of the oscillator and the carrier. If
the local oscillator is off, a beat frequency will exist at both the I and Q terms. The product of I
and Q delivers a control signal back to the VCO, locking it to the incoming carrier.
This is only an introduction to modern, integrated RF concepts, but hopefully enough to
inspire a project idea and further research. The field of digital communications has virtually
exploded with the use of these advanced techniques, leaving much room for newcomers along
the way.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
OSCILLATORS, PHASE LOCKED LOOPS, AND RF INTRODUCTION Keith Barr 312

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
Source : ASIC Design in the Silicon Sandbox Keith Barr 313

Converters and Switched Capacitor Techniques


Digital circuits can process data with clever binary mathematics, but somewhere in a
system, real world signals need to be converted to the digital domain, and often back
again. This is where the real value of your ASIC design is centered—in the acquisition
and distribution of real world signals for monitoring, recording, decision making, and
controlling real world processes. An analog-to-digital converter (ADC) quantifies an
input signal to some resolution and at a certain sample rate, while a digital-to-analog
converter (DAC) produces an analog output that represents the binary equivalent
presented to it.

The Ladder DAC


The simplest DAC is the ladder DAC, or R/2R resistor network.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 314

This is a 4-bit example, which can be extended to any number of bits. Each input bit
selects whether that bit position switch is set to the reference potential (REF) or GND.
The output resistance is equal to the R value; each position along the ladder is presented
with this R valued resistance, so the network must be terminated at the LSB end with a
2R value. In this 4-bit example, the output ranges from GND, with all switches set to
GND, to 15/16 of REF, when all switches are set to REF.
When building a ladder DAC in CMOS, the switches are of course MOSFETs, and the
reference is most conveniently, but not necessarily, the supply potential. The ladder DAC
can be very precise, provided the sheet resistivity and detail etching across the design is
uniform, and that the series resistance of the MOSFETs employed have sufficiently low
“on” resistance. When constructing the ladder DAC, you must take device resistance
into account, subtracting a slight amount from the resistors connected to the switches to
compensate for switch resistance. This is problematic, as the resistance of the P and N
devices will be different; they will vary from lot to lot and with supply voltage. The
switches are operated in the linear region, where supply voltage will control on resistance.
The best solution to achieve high precision is to make the switches very large, with
typical on resistances that are perhaps 1% of the R2 value, or less. Further, the drivers to
these switches should be considered, as large devices configured as simple inverters will
conduct severe current spikes from the reference during switching. Driving the output
devices with imbalanced inverters will help.

The output devices are driven off more aggressively than they are driven on,
minimizing a reference current spike during transition. With VDD at 5 V and the output
terminated with a 20K resistor to a mid-supply, this driver shows a 5 mV device drop
when pulling to supply or ground. This indicates a switch on resistance of about 40 Ω.
The driver propagation delay is about 300 ps.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 315

Ladder resistors must be high valued to allow for reasonable MOSFET sizes, and the output
resistance will not allow loading without error. If a reasonable drive current is required,
consider a simple inverting amplifier that uses a feedback resistor.

Establishing a mid-supply with identical resistors across the reference supply allows the
amplifier to deliver a near rail-to-rail output while maintaining the amplifier inputs at a single
voltage level. The output will be inverted, but the amplifier gain will allow output currents up
to the amplifier’s output capability with any resulting error determined by the amplifier’s
transconductance and the load resistance.
The layout of the resistors for the ladder DAC should be as regular and uniform as possible
as identical strips, with extra concern for the MSB end. The most common error in a ladder
DAC is when the converter input changes by a single LSB from 011111… to 100000…,
where resistor mismatching is most profoundly revealed.
It is expected that the analog output from the ladder would be greater with an input code of
100000… than with an input code of 01111…. If this is so at every input code transition, the
converter is said to be monotonic. If not, and this happens particularly at the transition of the
MSB in a ladder DAC, the converter is not monotonic. Simulate your design with careful
attention to the midpoint of the binary scale. Design in a few extra poly resistors at the MSB
end of the array, with no required connections, simply so that the etching process is not
different around these most important devices.
The presence of m1 above poly could cause a slight variation in the resistor value beneath,
due to mechanical stress. It is advised to make connections that overlap the resistors in a
higher level metal layer. Using twice as many resistive strips per bit allows for a more
convenient layout that does not require metal on top of the resistors. Note the extra poly strips
at the MSB end of the array in Figure 12.1 and the slightly shortened resistors as they connect
to the bit drivers at the

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 316

Figure 12.1 Layout of poly resistors, including “dummy” strips at MSB end.

bottom of the array. If the resistor array is surrounded by some space to other objects, 8-bit
performance may be achieved, but to obtain consistent 10-bit performance, the resistors must
be physically large.
In high-speed designs, the resistor values are made smaller, the drivers faster, and lower
accuracy is to be expected.
This ladder DAC, and the simple filtering of a pulse width modulated binary output of the
DSM circuit, illustrated at the end of Chapter 8, are the only simple circuits available for
performing the digital-to-analog function. Switched-capacitor DACs can be built that reset an
integrator and then apply known charges through capacitors to the integrator input, but the
process requires sampling and holding of the output value, which is problematic in itself. I do
not recommend any switched capacitor circuits for the DAC function, except for the delta-
sigma switched capacitor DAC, which is very useful for narrowband, high resolution
conversion. Switched capacitor techniques will be dealt with later on in the chapter.

Resistor matching
When we need closely matched resistors, as in the ladder DAC, issues of device matching
become critical. Commercial processes rely on either diffused silicon or deposited polysilicon
for resistor construction. In the case of diffused junctions, the silicon is quite uniform, but the
doping has minute variations from location to location. Polysilicon consists of numerous
submicron dimensioned crystallites with variable conductivity between crystallites. In general,
the resistive materials display a random variation in specific resistivity, which is size
dependent.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 317

Resistivity can be determined from very large test structures, which provides a mean sheet
resistivity value for the process. Resistors drawn to be very small, however, will show
variations that depend more on etching variations than larger resistors, but also, very small
resistors demonstrate statistical variations from device to device, even though they are
fabricated next to each other, with identical geometries. We can imagine a single, 1 μ2 area of
resistive material, and that it will have a value reflecting the sheet resistivity of the process,
plus or minus a small amount that varies from one 1 μ2 area to another, due to the lack of
absolute homogeneity of the material. If four such areas are combined into a single (2 μ × 2
μ) square, the resistance will be the same, but the averaging of the random errors of each
component 1 μ2 will result in a statistical variation from the mean that is half that of a single 1
μ2.
Such variations can be defined as the 1 sigma variation from the mean for a 1 μ2 structure,
and the designer can then calculate the expected statistical variations of any drawn features by
finding the square root of each resistor’s drawn area, and scaling the 1 μ2 error by this
amount. Typical 1 sigma variations for diffused resistors are on the order of 1% to 2% for a 1
μ2, while polysilicon resistors may be as high a 2% to 4%.
To obtain the close matching tolerances required in the ladder DAC, large areas of diffused
resistors may be needed or even larger poly resistors. To achieve 8-bit performance, and
expect that less than 1% of the die will fail a monotonicity test due to resistor mismatching,
statistical variations at the 3 sigma level must be within tolerance. This forces the required
area of diffused resistors to be on the order of 100 μ2 each, and if poly resistors are used,
perhaps 500 μ2 each. The primary problem when attempting to achieve high accuracy in a
ladder DAC is this unavoidable but predictable matching problem. The critical resistors at the
MSB end of the ladder can require prohibitively large resistor areas. Diffused resistors for use
in a 12-bit design would need to be some 20,000 μ2 each.
Finally, when considering the use of diffused resistors in dynamic circuits, it must be
remembered that the structure is that of a substrate diode. As such, the resistor will be
nonlinear with applied voltage, which is, in fact, a reverse biased diode structure. The
resistance will rise with reverse bias. The well feature, although bulky from design rules that
must be applied, may be attractive from the standpoint of high resistivity. The well is actually
a very deep, low dopant concentration junction that will show extreme variation with applied
voltage, especially if the resistor is drawn to be very narrow. At the usual minimum well width
dimension, the well resistor approaches a junction FET, with the substrate acting as a gate, and
will approach pinch-off at resistor

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 318

potentials on the order of the supply voltage. Be very careful when using the well as a resistive
layer. Also, try to obtain voltage coefficients for diffused resistors from your foundry.

ADCs
Despite consumer applications that output audio and video as analog signals, which surely
require DACs, and despite the huge production volumes of these limited application devices, it
is far more frequent in ASICs that analog signals be converted into the digital domain. For
this, many techniques are available.

Successive approximation ADC


The process of producing an analog output from a ladder DAC is like the process of
multiplication; the output will equal the reference voltage times the binary code applied. With
this in mind, the ladder DAC can be seen as a binary attenuator of the reference signal.
Analog to digital conversion can make use of the ladder concept, but in this case, it becomes
more like division; an algorithm must be used to successively approximate to the input analog
value through trial and error bit values sent to the ladder DAC, as shown in Figure 12.2.
The successive approximation register (SAR) is a logic element that drives the DAC,
depending on the logic condition at its input. The SAR is first initialized into a starting
condition on a rising clock edge while RESET is active. This forces the SAR to produce a
midscale output to the DAC, with the MSB set and all other bits low; the DAC produces a
midscale analog output. The comparator determines whether the input value is greater than or
less than the DAC output, and on the next clock rising edge this value is set into the SAR
MSB, while the next lesser-significant SAR output bit is set high. The process continues until
all

Figure 12.2 Block diagram of successive approximation conversion.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 319

Figure 12.3 Successive approximation register structure, which can be extended to any
number of bits.

bits have accepted the comparator output, whereupon the cycle can repeat. A 4-bit example of
an SAR is shown in Figure 12.3.
The SAR can be carried on to any number of bits, and can be clocked at a high rate.
Most ADCs operate on a fixed input value that is captured at a moment in time by a sample
and hold circuit. If the input signal is known to be changing very slowly, no more than an
LSB of converter resolution over the successive approximation process period, then a sample
and hold circuit will not be required. If the signal is changing quickly, then not only a sample
and hold circuit is required, but an anti-alias filter as well.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 320

Figure 12.4 Sampling of a sinusoidal waveform at 8X rate.

The switch is turned on briefly so that the capacitor can acquire the potential of the signal
from the low-pass filter. When the switch turns off, the capacitor holds the stored charge,
which is buffered by the amplifier. The switch is an MOS transmission gate driven by control
signals. The switch must have turned off and the amplifier settled to a stable point before the
successive approximation process can begin.
The alias process is simply understood. If a 1 kHz sine wave is sampled at 8 kHz, as shown
in Figure 12.4, the waveform is reasonably acquired.
The output is held for the conversion period between sample moments, and the converted
samples fully describe the input waveform at the 8 kHz sample rate.
In Figure 12.5, the output appears to be at 500 Hz, which is the alias frequency of 1 kHz
when sampled at 1.5 kHz. Any input signal above Fs/2 must be removed by a low-pass filter
prior to sampling, which is very difficult using analog techniques. Since the roll-off rate of
simple filters is quite limited, the filter cutoff frequency must be placed at a much lower
frequency than Fs/2 to ensure that high frequency input signal components do not alias into
the sampled output signal. Later, we will see how digital techniques can be used with
oversampled converters to accomplish anti-alias filtering in the digital domain, where steep
filter cutoff rates can be sharp and well controlled.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 321

Figure 12.5 Sampling a sinusoidal waveform at 1.5X rate.

Flash conversion
In cases where the sample frequency is very high, the input signal can simply be compared to
multiple taps on a resistor divider to obtain an output virtually instantaneously. The structure
is usually restricted to a relatively few output bits, as the number of resistors and comparators
required grows as 2 to the power of the required output bit width. Nonetheless, 4- to 8-bit flash
converters are useful in communication and video applications. The comparators used can be
extremely simple, but their offset voltages must be low, presumably below an LSB of input
signal range. The priority encoder must receive the 2^N-1 comparator outputs and produce an
N bit result quickly; pipelining registers can be placed at the inputs and outputs of the priority
encoder so that while the comparators are settling on a new input value, the priority encoder
can be deriving a binary value from the previous comparator states. The basic concept is
illustrated in Figure 12.6.
The input range of the flash converter can be limited to ground and perhaps half supply,
allowing all of the comparators to be identical, with the signal levels well within their common
mode input range. A fast comparator with features for high-speed operation is shown in Figure
12.7.
Figure 12.7 shows a standard CMOS comparator, but m6 limits the maximum excursion of
node 3, leading to a faster, more symmetrical response. This can be used in flash converters at
sample rates up to a few hundred megahertz.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 322

Figure 12.6 Example of flash converter (shown in part).

The priority encoder is a complicated mass of logic that uses look-ahead techniques to settle
quickly. A block of 8 bits is shown in Figure 12.8.
Multiples of this block can be used in a linear array to encode from comparator outputs to
an N bit binary code. For a 63 comparator

Figure 12.7 A comparator with a clamped internal node to improve recovery from extreme
input signals.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 323

Figure 12.8 8-bit example of priority encoder.

converter, eight such blocks would be required. The bottom block would have its A0 terminal
always set high, which allows a slight change to the bottom block. Carry output from each
block propagates down to lower blocks; the carry in at the top block is set to VDD, and
another change to the schematic may occur here. The lower three output data bits are common
to all blocks; the upper three data bits would come from a ninth block, driven by the ACT
terminals of the eight input blocks. ACT goes high if any input to that block is high and the
bits of all upper blocks are low. This circuitry can be built along the comparators with a row of
registers in between in 8 or 16-bit blocks, which can then be arrayed.
A 64-level flash converter working from a 3 V reference and a 5 V supply in 0.6-μ
technology would have each LSB of resolution equal to 47 mV, which is well within the
statistical offset variations of very small device comparators; the comparator section is quite
small as a result. The individual comparator supply current must be on the order of 300 μA to
achieve a sample rate of 400 MHz. The resistor divider array must be of rather low value so
that it’s time constant, being loaded by the comparator inputs, does not affect conversion
speed. For a high-speed

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 324

converter, expect the current drawn by the resistor divider to be in the same order as the total
comparator current.
An 8-bit flash converter in 0.6-μ technology will require longer time to settle and larger
input MOSFETs to overcome offset issues. The entire converter will draw some 160 mA of
total supply current, fit in a less than 1 mm2 of space, and sample at perhaps 200 MHz.

Low rate ADCs


Many signals are only slowly changing in time, as the case would be with sensor circuits or
user control signals, as from a potentiometer. In these cases, filtering and sampling is not
required. Further, the input to a single converter can be fed by a multiplexer that can access
numerous inputs for conversion, one at a time. Let’s first consider the ramp converter,
illustrated in Figure 12.9.
This converter is based on a counter that runs continuously. C1 is charged by m3, but only
while the counter’s MSB, Q12, is low. When Q12 goes high, C1 is discharged and the flip-
flops F1 and F2 are reset. As C1 charges, the outputs of the comparators X1 and X2 are
clocked into F1 and F2 on the backside of the system clock. As these flip-flop outputs go

Figure 12.9 Example of a ramp converter.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 325

high, the corresponding output register is clocked, capturing the counter’s contents. One
register will contain the reference count value, the other the input count value. Dividing the
input count by the reference count delivers the conversion result.
This style of converter is reasonably accurate and can be clocked at up to 100 MHz,
delivering 12-bit results in a 100-μs conversion cycle, or perhaps 8-bit results every 5 μs. The
requirement for a division makes the converter messy and the need for a bias value at m3,
which properly charges the capacitor at the correct rate, makes the circuit unattractive.
We can produce a bias circuit for m3 that causes the peak charging voltage to equal the
reference at the moment the counter’s lower bits roll over, which is shown in Figure 12.10.
In this case, the ramp is active over 75% of the conversion cycle, as C1 is reset only when
the counter’s two upper bits are high. The output register is preset during this period, so that
an out of range input value (which will never cause register clocking) will result in a full-scale
output. If the ramp voltage on C1 exceeds the REF potential while Q12 is still low, m7 will
turn on, conducting a small current to pull high on

Figure 12.10 Control circuitry to ensure the ramp has a peak voltage equal to reference
potential.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 326

m3’s gate; if Q12 ever goes high while the ramp has not yet reached threshold, m4 will turn
on pulling a small current low on m3’s gate. These gated currents are set by a global P-type
bias voltage (VB) that is mirrored by m9 and m8. M3 must be very large to have a low lambda
effect, and also serve as a storage capacitance for the control process. The system will adjust
itself continuously so that the peak ramp voltage hits the REF potential at the moment of
counter rollover. No division is required, and the output register’s contents can be used
directly. Careful simulation and possible schematic modification may be required to guarantee
operation over all possible input signal conditions.

Averaging converters
Converters such as the previous ones rely on a comparator to make decisions as to whether a
signal is above or below a given potential, either derived from a charging capacitor or a ladder
DAC output, and are subject to noises on-chip that can cause false decisions, leading to errors.
For slowly changing signals an alternative is available—the averaging, single-channel
converter. Single channel means that, because of an averaging process, such converters cannot
be made to quickly switch from one signal source to another. The averaging process allows
high resolution, but requires significant time to arrive at a high resolution averaged result.
An example is illustrated below.

The output is then taken to a signal averaging process. In this case, the reference is VDD; if
the output bitstream is averaged, the result will be a numerical value that equals the input as a
proportion of VDD. The flip-flop output can be sent to switches that reflect an actual reference
potential if VDD is not adequate as a reference. The terminal MID is expected to be set to one-
half the reference value.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 327

In operation, the amplifier is configured as an integrator, and will produce an output that is
at the input threshold of the D terminal of the flip-flop. A positive input current through R1
(when the input is at a greater potential than MID) will cause the integrator output to fall,
producing zeros at the flip-flop output, which will then force currents through R2 to bring the
integrator output high again. The integrator is integrating the error between the input signal
and the output, forcing the average of the output bitstream to equal the input potential. Since
the amplifier inputs are always at a known potential, the input signal can swing over the full
supply range but the amplifier can be simple, with limited common mode range. The input
current is determined by R1, which can be several megohms in a design, fabbed in a high
resistance poly process. If R2 is equal to R1, the input signal range will be GND to VDD
(REF).
C1 is noncritical—any value is acceptable so long as the amplifier does not exceed its
maximum output signal swing within a single cycle when both the input and the feedback are
of the same polarity. For R1 and R2 of 100K, and at a clock frequency of 10 MHz, 3 pF would
be adequate. The folded cascade amplifier design would work well here, as the bottom plate
capacitance of C1, attached to the amplifier’s output, will compensate the amplifier. For high
accuracy, it is imperative that the signal at the summing node of R1, R2, and C1 has an
average voltage equal to the MID supply; transients due to the switching of the current
polarity through R2 will upset the amplifier as it attempts to dynamically integrate the input
currents. The output current capability of the amplifier must be more than capable of
supplying the required currents. In the example of R1 and R2 of 100 KΩ, the bias at the output
of the amplifier should be at least 100 μA. R1 and R2 values of 10 MΩ can be used in low
power applications, with accordingly small amplifier currents.
This analog circuit is analogous to the digital delta-sigma modulator (DSM) outlined in
Chapter 8, used for converting a binary value to a serial bitstream, with averaging in the
analog domain. Here, we use the same modulating concept to generate a binary bitstream from
an analog signal to be averaged in the digital domain. As with the logical DSM converter, it is
imperative that the rise and fall times out of the flip-flop be as closely matched as possible,
and of as short a duration as possible. Accuracy will suffer when the transition times of the
device driving R2 approaches about twice the clock period times the resolution of the
converter. For correct operation, the binary signal entering the averaging filter must precisely
represent the feedback current through R2. Binary 1s and 0s are perfect in their meaning, but
analog signals that suffer transition times are not. If the converter is to be accurate, the analog
value used in the modulator must be identical to its binary equivalent in the digital domain.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 328

Digital averages can be obtained through two primary means—a low-pass filter or a
counter. The low-pass filter will produce outputs at every clock cycle, but will demonstrate a
time constant, as would an RC averaging filter. This may produce a perfectly adequate result
for data acquisition purposes. A counter will produce a defined result, but only after a
specified number of counts has elapsed. The low-pass filter is illustrated in Figure 12.11.
This is a 4-bit example; a practical filter will be a dozen or so bits wide. In this example, we
have an 8-stage integrator, the LSB stages on top, and the MSB stages at the bottom. The
integrator is adders driving flip-flop registers with feedback from the Q outputs back to the
adders again. There is also feedback from the lower (most significant) flip-flop QN outputs
back to the top adder row, which constitutes a shifted, complemented feedback signal. The
upper rightmost adder (LSB) has its carry set to VDD so this QN feedback path becomes a
2’s complement shifted subtract. The input bit is carried into a half adder that provides
feedback sign extension to the lower (MSB) adders and a data bit to the upper leftmost adder.
This is where the input bit is effectively added to the integrator contents, shifted right by 4
bits. The result at Q3-Q0 will be the average of the input bitstream, with a time constant of 16
clock periods.
Larger counters will provide greater averaging capability, higher resolution outputs, and
longer time constants. A 12-bit converter would require 24 adder flip-flops, and suffer a 4096
clock cycle time constant. Clocked at 10 MHz, the time constant would be about 400 μs.

Figure 12.11 Schematic of low-pass filter for averaging a single bit data stream (4-bit
example).

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 329

A counter solution that assumes a master program counter within the ASIC design is
illustrated here.

Again, this is a 4-bit example. The data from the single bit converter is accepted on the
backside of CK, and gated into a ripple counter if that bit was high. The rising edge of
CLKOUT both transfers the ripple counter output to the output register and simultaneously
resets the ripple counter. The precise timing between CLK and CLKOUT must obviously be
developed carefully.
The inaccuracies that we find with the previous modulator originate with the rise and fall
times of the feedback signal from the flip-flop back to the integrator, which delivers a
feedback signal that is not perfectly represented by the binary value sent to the digital
averager. Further, the integrator circuit is constantly attempting to adjust its output to the
threshold voltage of the flip-flop D input. It is conceivable that at some D input potential, the
flip-flop will be indecisive; it will hesitate at a midoutput level until it flips to one direction or
the other. This metastability will occur occasionally, but the metastable condition will only
persist for an extremely short time—on the order of a nanosecond.
Analysis of the flip-flop shows that the period of time that a flip-flop can be stuck in a
metastable condition depends on the transconductance of the devices within the flip-flop,
acting against internal loading capacitances. The situation is very much like trying to stand a
sharpened pencil on its end—no positioning can provide a perfect balance, but closer
positioning can allow the pencil to stand for a slightly longer time before it falls. SPICE
simulation (options accurate) will show that the range of possible starting conditions that lead
to delayed output from the flip-flop is extremely narrow—when within 1 μV of threshold, the
delay time to output is perhaps twice the normal full level propagation

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 330

delay, and the probability of such a close value from the integrator is very low. Nonetheless,
metastability can persist for a nanosecond or so in those rare instances, creating an error. If
switched capacitor techniques are used, high accuracy can be provided, and a delay time can
be inserted to allow for flip-flop metastable states to clear.

Switched Capacitor Converters


A capacitor can be used in conjunction with switches to simulate a resistance. If a capacitor is
alternately charged to a potential and then discharged to zero potential, the average current
drawn from the potential source will equal that of a resistance of 1/Fc. This allows the use of
switched capacitors (SC) to simulate resistors directly. A 1 pF capacitor switched in this way
at 1 MHz will appear from an averaged voltage and current viewpoint to be a 1 Ω resistor.

The switches used are almost universally a pair of devices—an NMOS in parallel with a
PMOS—and the clocks shown as A and B are, in fact, differential clocks (AN for NMOS, AP
for PMOS). Further, so that all switches are never on simultaneously, the A and B clock
signals are derived from logic or delay elements to avoid overlapping.

The nonoverlapping periods may be exaggerated; in practice the time between clock phases
can be extremely short, on the order of a few nanoseconds. The first averaging ADC can be
done with switched capacitor techniques, as shown in Figure 12.12.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 331

Figure 12.12 Switched capacitor technique of integrating the error between an input signal
and a reference signal, producing a single bit out.

In this example, MIDREF is a potential midway between REF and GND, and is used to bias
the integrator and also as a midreference for the switched capacitor circuits. In many practical
circuits, REF would be VDD, and MIDREF would be a bypassed and buffered signal from a
resistor divider across the power supply.
The input capacitor Cin is charged to the input potential during phase A, while the feedback
capacitor Cfb is discharged. During Phase B, the charge on Cin is delivered to the integrator,
while Cfb is also connected through S7, and connected to either GND or REF, depending on
the last registered condition held in the output flip-flop. The generation of A and B (and their
complements), for driving the CMOS switches, is not shown.
When conveying signals through switched capacitor circuits, stray capacitances can cause
charge errors; often, the value of the capacitors used is on the order of 100 fF, and stray
capacitances are a significant portion of such small values. There are two switch/capacitor
configurations

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 332

that are stray-insensitive by nature: one inverting and the other noninverting. An example of
an inverting structure is that of S1, S2, Cin, S3, and S4 in Figure 12.12. The capacitor is
charged to the input potential to a common terminal (in this case MIDREF) but the charge is
delivered in an opposing polarity. The junction of S3 and S4, however, is always at the
common potential of MIDREF, since the amplifier presumably has only a small offset.
Because the potential at this node is not changing appreciably during operation, any stray
capacitance at that node has a negligible effect on the charge delivered to the integrator. Also,
in the case of Cfb, the junction of S6 and S7 behaves similarly. This lower portion, consisting
of S5, Cfb, S6, S7, and the two transistors, constitute the noninverting, charge-insensitive
structure.
It’s easy to imagine using a switched capacitor structure like the one illustrated here:

The problem with this design is that the charge delivered to the integrator will not be the
input voltage times the C value, but it will also include capacitance values of the source/drain
regions of the switches, and any other capacitance that interconnects the switches to the
capacitor. In the case of the source/drain junctions, the capacitance will change with applied
voltage, leading to nonlinearity in the input voltage–delivered charge relationship. This
capacitor structure will actually introduce distortion into the converter characteristic.
The switched capacitor converter is considerably different from the resistor version. The
switched capacitor device constitutes a sampled system, whereas the input through a real
resistor is a continuous time process. Sampled systems are extremely sensitive to noises on the
input or within the circuit substrate. A signal propagating through a resistor into an integrator
is very forgiving in the presence of high frequency noise, naturally filtering out high frequency
components. The switched capacitor filter, however, can capture a single noise event and carry
its value into the integrator as a fully valid signal.
During input sampling, Cin charges through S1 and S3 and acquires a potential that is the
difference between IN and MIDREF. Noise may

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 333

exist on the IN or the MIDREF signal lines though, and the potential across Cin will track this
noise. At the moment S1 and S3 turn off, whatever errors are present are fully delivered to the
integrator during the B phase. Due to the high switching speed of the CMOS devices, and the
time constants between the on resistance of the switches and the value of Cin, the bandwidth
over which noise can be folded (aliased) into the baseband (DC to Fs/2) can range up to
hundreds of megahertz. Special consideration must be given to switched capacitor circuits in
this regard, especially ones that operate in a high noise digital environment (which is almost
always the case). To overcome such noise issues, differential techniques are used.

Differential switched capacitor structures


When signals can be developed differentially for SC processing, or at least processed with SC
circuits differentially, the improvement in signal-to-noise ratio is astonishing. Early switched
capacitor circuits, operated in relatively quiet environments as single-ended designs were
limited to perhaps a 70 dB dynamic range, but more recent differential techniques, even in the
presence of significant processing noise, perform at thermal noise levels, with a dynamic
range, on the order of 120 dB! The circuits required have their own difficulties, which I will
attempt to explain.
First of all, if we make an amplifier with differential inputs and differential outputs, we
need a means of establishing a common, average potential for the outputs; they may be
providing a correct differential output difference value, but without common mode control
they could do so anywhere between supply and ground, which may not be appropriate for
succeeding SC stages.
The amplifier of Figure 12.13 is intended to be operated with a mid-supply potential applied
to MID, and an external circuit (Figure 12.14) that connects a resistor from each of the outputs
to the internal node labeled CM. These resistors are usually switched capacitor circuits, with
additional capacitors connected across them so that changes in average output potential are
immediately conveyed to the common mode (CM) input. The resistors are not shown in the
amplifier schematic, as they are most often implemented with switched capacitors in the
application circuit.
The fully differential, folded cascade amplifier has one wonderful characteristic—that of
wide bandwidth and great stability, as it has no high resistance, capacitively loaded internal
nodes in the differential signal path; and its one ugly characteristic is that of poor common
mode stability. The internal common bias point at m9, m10, and m22 causes a common mode
instability that is difficult to control.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 334

Figure 12.13 Fully differential folded cascade amplifier for switched capacitor filters.

There are generally two approaches to common mode control: brute force, with a large
current through m19 and m20, forcing m22 to display a lower dynamic resistance and pushing
its corner frequency higher, as illustrated in Figure 12.13; and putting very small currents
through m19 and m20, expecting that the RC at the drain of m22 can become the dominant RC
(shown in Figure 12.15).

Figure 12.14 Common mode control of the output of a fully differential amplifier.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 335

Figure 12.15 Schematic of fully differential, folded cascade amplifier with different common
mode biasing scheme.

In Figure 12.15, low currents through the common mode controlling differential pair m19
and m20, and the loads connected to provide gain, substantially raise the resistance at the drain
of m22, lowering its roll-off frequency dramatically. The first approach to common mode
control is quick and responsive, but without very large diffpair currents, some common mode
ringing will always persist after a signal transient. The second approach draws less power, but
allows the common mode output potentials to wander with poor short-term control. In either
case, if the circuitry is laid out in a careful manner with respect for differential balance, a
small instability in output common mode is often acceptable. Compensation capacitors and
resistors can be applied to help the second design’s common mode stability, but arriving at a
good trade-off between differential speed and gain, and common mode response time and
stability is usually a frustrating task, as two independent objectives are simultaneously sought.
Differential switched capacitor circuits are complicated by the fact that the signal is
traveling through two separate but identical paths, and each amplifier must have common
mode circuitry detailed. The advantages of differential operation, however, are worth the
trouble of pursuit. If carefully designed, they are virtually immune to substrate and power
supply noises, and can allow small signals to be resolved to high accuracy, despite an
extremely noisy background.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 336

A single-stage delta-sigma modulator using differential techniques is illustrated in Figure


12.16.
The inputs are now differential, and the circuit is considerably more complicated. It is,
however, capable of resolving tiny signals within narrow bandwidths. The limiting factors will
be 1/f noise in the input amplifier, stability of the reference potential, the size of the
capacitors, and the switching frequency.
Wait a second here. The whole idea of playing in the sandbox is that it’s simple and fun,
yet we’re getting into stuff here that’s complicated, and quickly—fully differential, folded
cascade amplifiers with common mode control, and switched capacitors with two
nonoverlapping phases of both N and P drive style? OK, the work is obvious, but where’s
the fun?
Don’t despair; the end is in sight. There are more issues, and yet more clock phases, but it
doesn’t go on forever; there is a limit to the

Figure 12.16 Fully differential implementation of single-stage single bit modulator.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 337

extent of these techniques, and we’re almost there. The wonderful result that comes from
differential techniques is the reward. The settling of differential switched capacitor circuits is
fascinating; the fully differential amplifiers are fast and generally well behaved, and are
excellent at rejecting substrate noise that would completely ruin SNR in a single-ended design.
What we need at this point is just one more dose of the “sandbox spirit” to get there.
Sandbox spirit, by the way, is one of ferocious enthusiasm for acquiring the required
understanding; so get up, stretch, take a deep breath, and steel yourself for what follows.
OK, back to the detailed stuff.
I’ve introduced simple switched capacitor ideas, and there will be a few more details later,
like switched capacitor noise, multiphase clocks, and higher order delta-sigma modulators, but
first, let’s look at how the switching process can help reduce amplifier offsets and 1/f noise
with chopper stabilization—a MOS switching technique that is valuable for sensing low
bandwidth, very low level signals.
The idea is simple, provided we are using amplifiers that have both differential inputs and
outputs. Whether an amplifier has a long term offset or 1/f noise, which appears like a varying
amplifier input offset, the amplifier inputs can be connected to a signal source and an output
destination in one polarity, which will demonstrate its offset at the output terminals, and then
in the opposite polarity, demonstrating its offset in the exact opposite sense. The average of
the two states gives a zero offset result, as illustrated below.

Once the outputs are averaged, a low bandwidth, very low apparent offset (or 1/f)
characteristic, results.
The use of CMOS switches and differential amplifiers provide flexibility in design, similar
to the RF techniques outlined in the previous chapter. Differential designs are more
complicated, but their symmetrical

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 338

layout is often quite satisfying. There is a certain perfection that differential circuits possess—
a quality that becomes apparent from the schematic onward through to the layout. When
you’ve done a few differential designs, all of a sudden, single-ended circuits seem somehow
incomplete and generally flawed in comparison. After you’ve carefully designed a few
differential circuits with attention to perfect balance, you will begin to see all sorts of possible
problems with single-ended designs.
In order to keep differential layouts fully differential (that is, balanced), all that needs to be
remembered is that the stray capacitances that affect one side of the differential circuit must
also equally affect the opposite side. Differential signals will be carried from one place to
another on chip, and always together. A good example of how to connect a pair of differential
lines from M1 to M2 is shown here.

Differential circuits provide handy features, like perfect complement signals wherever
needed, but most important is their ability to reject substrate noise. If the capacitance from
differential signals to substrate is perfectly balanced, then there can be little substrate noise
feedthrough (in the differential sense). Any imbalance gives rise to signal contamination from
substrate noise. The balancing act all happens at the layout level—placement of shielding
features, running of signal lines, and even the placement of differential bonding pads and
symmetrical GND or VDD pads on each side. The ability of a differential circuit to reject
substrate noises is entirely up to the layout engineer, who, in the sandbox, is also the designer.
One note of caution, however: Since differential circuits will have signals running through
them, this in itself will present an imbalance. Since the device diffusions have a nonlinear
capacitance to substrate or well, the electrical imbalance can cause substrate feedthrough, no
matter how well the design is quiescently balanced. In any case, a quiet substrate or, more
particularly, a substrate that only carries synchronous noises is desired. By “synchronous” it
is meant that if other high transient current devices are operating on-chip, such as memories or

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 339

processors, the pulses of current drawn by these devices should be regular and at the same
frequency (and fixed phase relationship) to the switching of the SC device. Noise in a
switched capacitor environment is only truly noise if it is not correlated with the process at
hand; if the substrate disturbance is occurring while a switched capacitor circuit is settling it
may be OK, provided everything is settled and quiet at the exact moment the switches turn off.
SC switches can even turn off at a substrate-disturbed moment, provided the substrate
disturbance is identical on each such switching event. This is the method through which
sensitive circuits can coexist with fast logic circuitry, through such synchronization.

High-ordered delta-sigma converters


Delta-sigma converters are named so because they take the difference between the input and
the output (delta) and successively sum (sigma) these errors to obtain a new output condition.
The delta-sigma modulators (DSM), which are the logical choice for generating a single bit
for RC filtering into a continuous time signal; the resistor/integrator ADC with a single bit
output intended for averaging to a high accuracy binary signal; and the switched capacitor
equivalent as well, all utilize a single integration function. Further, since the input bandwidth
of any of these is a tiny fraction of the modulator clocking rate, these are appropriately called
over-sampling converters. When the signal bandwidth is narrow, such that the clock frequency
to a switched capacitor circuit can run at a much higher frequency than the upper limit
frequency of the signal, then oversampling techniques can be used to good advantage.
The signal-to-noise ratio (SNR) of a converter is the ratio of maximum signal amplitude to
residual noise. In the case of the single integrator DSM ADC, after averaging through a low-
pass filter, the results become more accurate than the single bit at the filter’s input, but time
is required for the filter to settle to a correct result. In effect, we have limited the bandwidth of
the output signal and in return, we have increased its SNR. Further limiting of bandwidth, by
lowering the cutoff frequency of the filter, delivers a lower noise level and potentially higher
resolution.
In the single integrator (also called single order) modulator, the integrator is presenting an
average of the error between the current output bit and the input signal voltage to the output
flip-flop, which produces the next output bit. The integrator is acting like a low-pass filter,
forcing the error between the input and the output to be lower at low frequencies than at high
frequencies. You could say that the error-reducing function of the filter favors low
frequencies, because it has more gain at

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 340

Figure 12.17 Second order delta-sigma modulator.

low frequencies. This must be the case, because we find the accuracy increasing as the
following filter is set to a lower frequency. If we use two integrators in series, we would have
yet more error controlling gain at lower frequencies. This would be the idea, drawn as an RC
implementation, and single-ended for clarity as shown in Figure 12.17.
In this case, since the integrators are inverting in nature, A3 is effectively summing the
integrator outputs, as the output of A2 is out of phase with that from A1. We find that the
noise from this second order modulator falls with frequency more dramatically than the single-
ordered modulator; we can use a higher-ordered post-filter to achieve better SNR, and over a
less limited bandwidth than in the case of the single order modulator.
Multiple integrator delta-sigma modulators are also called noise shaping converters, as we
can see that the quantization noise from the single bit output is being rejected from the lower
frequencies, and being pushed into higher frequency bands that do not pass easily through the
integrators to the output flip-flop. The higher frequency noise components in the binary
bitstream must be filtered out from our output signal, requiring a steep, higher ordered output
post-filter.
As we add more integrators to the modulator, the noise will be reduced further at the lower
frequencies, but instability can occur as we sum the outputs of the integrator stages for
quantization at the flip-flop. A simplified third order modulator is shown in Figure 12.18.
Here, I’ve inverted the output of the second integrator so that the three stages can be
summed to the positive input of A5. Since each integrator stage is inverting, the output of A1
and A3 will be in phase with each other, but stage 2 is not. Using differential techniques, such
complications

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 341

Figure 12.18 Third order delta-sigma modulator.

are not required, as both true and inverse outputs appear at each stage. The third order
modulator is only stable if the integration rate (R × C) of the second and third stages are
longer than that of the first stage, or, correspondingly, the summation resistors place greater
emphasis on the first integrator output. Still, at high signal levels, the structure becomes
unstable, with the modulator only able to accept an input signal that is something less than the
full signal levels we were able to accept with the single order modulator. The SNR
improvement of higher order modulators is well worth the signal level limitations. Figure
12.19 shows the approximate relationship between over-sample rate and possible SNR.
Over-sample rate (OSR) is the ratio of the sample frequency at the converter input to the
sample rate output of the following low-pass filter;

Figure 12.19 The approximate relationship between signal-to-noise ratio, over-sample rate,
and modulator order.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 342

only noise over a frequency spectrum from DC to one-half the output sample rate is
considered. As can be seen from the above plots, single-ordered modulators require a high
sample rate to achieve reasonable SNR, but higher-ordered modulators can ideally reach over
100 dB SNR with modest oversample ratios. These plots are idealized, and not necessarily
realizable in practice; they indicate noise level, but do not reflect maximum signal level. You
might say they are reference-to-noise level plots.
The averaging of data bits gives a 3 dB improvement per doubling of the number of
averaged samples. Using a single integrator in an error-reducing loop, as in the single
integrator DSM, an additional 6 dB per octave is gained above the simple averaging process,
allowing a slope of 9 dB per doubling of the oversample rate. A two integrator system leads to
15 dB per doubling, three integrators lead to 21 dB, and so on.
It is quite simple to model the delta-sigma modulator numerically in simple programming
languages. The output bitstream can then be input to a simple Fourier transform to display the
spectral components. I will provide an example in BASIC, as this language is easy to follow.
dim bo(65536)
amp=0.5 //signal amplitude
for t=0 to 65535
signal=sin((2.0*pi*t)/128.0)*amp
int3=int3+int2*0.2
int2=int2+int1*0.4
int1=int1+signal−q
sum=int1+int2+int3
if sum>0 then q=1
else q=−1
end if
bo(t)=q
next t

The process is quite simple: establish a variable for each integrator, couple the integrators
together in series with coupling coefficients, send in the signal minus a quantized output value
to the first stage, sum the integrators to a term (I call it “sum”), and decide if that value is
greater or less than zero. I’ve used a large number of iterations for this routine, and set the
input signal to be synchronous with the process—one cycle of signal for every 128 sample
clocks. This is helpful in reducing potential numerical errors, in particular, the truncation of
the bitstream sequence as it enters an FTT. Windowing of the bitstream will most likely be
needed in the case of signal frequencies that are not related to the clocking rate. Notice

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 343

the sequence of operations is from the last integrator to the first, which reflects the timing in a
logic or switched capacitor implementation.
The array output of this simple routine box can be fed through a window function to a
Fourier transform to produce a spectrum of the single bit output signal.
Each horizontal line of Figure 12.20 represents 20 dB. The spectrum window is from Dc to
Fs/16. The signal is at Fs/128, which would be the band edge for a 64X oversampled
converter. Notice the noise is falling quickly toward DC, indicating that a very much improved
SNR would result from using this converter at 128X. After calculating the Fourier transform,
the noise power from DC to the signal can be summed to get an estimate of possible SNR.
Notice the signal is 6 dB down, adjusted in the program with the value “amp.” Larger
signals will eventually cause instability, which can be demonstrated in this simple simulation
setup. The software simulation allows the quick adjustment of coupling coefficients between
stages while testing for instability at high input signal levels.
The spectral response of the bitstream output implies that a digital filter can be employed to
remove all frequency components above Fs/128 (where the signal peak resides), and that the
result can be resampled at Fs/64. One technique for improving the response of the modulator
is to intentionally make the modulator’s filter resonant by adding a feedback value around
two sequential integrators with reversed phase. To do this we will need to store the last values
of the integrator contents so they may be used in the same fashion as the switched capacitor
filter.

Figure 12.20 Fourier transform of single bit data showing signal peak and modulator noise.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 344

dim bo(65536)
amp=0.5 //signal amplitude
for t=0 to 65535
signal=sin((2.0*pi*t)/128.0)*amp
lint1=int1
lint2=int2
lint3=int3
int3=lint3+lint2*0.2
int2=lint2+lint1*0.4-lint3*0.008
int1=lint1+signal−q
sum=int1+int2+int3
if sum>0 then q=1
else q=−1
end if
bo(t)=q
next t

The resonant peak is obtained by coupling int3 back into int2, through a negative coefficient
of 0.008.
The resulting spectrum is shown in Figure 12.21.
This resonant peak in the delta-sigma converter leads to a notch in the spectral response,
which improves SNR. Higher-order filters can benefit from this technique, but of course two
integrator stages are required for each resonant “notch.”
To fully model a real world modulator, you may wish to add noise sources into the
algorithm, or clipping levels that would certainly exist in an actual switched capacitor system.
This would be a fifth order modulator with clipping at ± 2 units, with two notches in the
baseband.

Figure 12.21 Third-order modulator with negative feedback coefficient.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 345

dim bo(65536)
amp=0.5 //signal amplitude
for t=0 to 65535
signal=sin((2.0*pi*t)/128.0)*amp
lint1=min(abs(int1),2.0)*sig(int1)
lint2=min(abs(int2),2.0)*sig(int2)
lint3=min(abs(int3),2.0)*sig(int3)
lint4=min(abs(int4),2.0)*sig(int4)
lint5=min(abs(int5),2.0)*sig(int5)
int5=lint5+lint4*.08
int4=lint4+lint3*.12-lint5*.007
int3=lint3+lint2*0.2
int2=lint2+lint1*0.4-lint3*0.01
int1=lint1+signal−q
sum=int1+int2+int3+int4+int5
if sum>0 then q=1
else q=−1
end if
bo(t)=q
next t

And the response is shown in Figure 12.22.


The SNR is substantially improved, but notice the out-of-band noise as frequency increases;
the shape of this noise curve will give you clues as to how to “tune” the modulator
coefficients that couple from stage to stage, and threaten to make the modulator unstable. Be
sure to put the limiting (clip) functions into your simulation, as in the line.

This presents a real-world and important limitation; after all, your floating point math may
be able to represent 100,000,000 V,

Figure 12.22 High-order modulator with two negative feedback coefficients.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 346

but the switched capacitor filter you implement the algorithm with can’t.
If you are contemplating a delta-sigma project, I strongly suggest starting with a simple
linear program followed by a quick Fourier transform, as shown above. It is very easy, and
through experimentation with the injection of noises and nonlinearities, which are quite easy
to build mathematically, your understanding of the whole subject will become clear very
quickly.
Delta-sigma modulators can be implemented as switched capacitor circuits for analog-to-
digital conversion, or as logic circuits for the production of a single bit value that can be
postfiltered into the analog domain with a switched capacitor filter, producing a digital-to-
analog function. The same software modeling applies in both cases. Models of the logic
implementation can be numerically truncated to observe the effect of word width on
performance. You will find that the first stage is critical, but subsequent integrator stages can
tolerate successively larger amounts of noise without harm; the final summation is almost
completely immune to noise injection. In a logic modulator, the width of each integrator can
be reduced several bits in width from the previous stage.
The single bit delta-sigma modulator is ideal, from the point of view that a single bit can
have only two possible states. A DAC with more than two states, say a 2-bit DAC with four
possible output levels can never be exact; the accuracy depending on the tolerance of the
components from which the DAC is built. In contrast, a single bit is always perfect.
Modulators can be built using multibit quantizers instead of the simple flip-flop, but the
inaccuracies of the DAC remain.
So, it appears that high-ordered modulators can be built fairly simply, with the promise that
high SNR can be obtained. We can build a logic modulator to drive a switched capacitor filter
to make a DAC, or we can build a switched capacitor modulator to drive a logic decimation
filter to make an ADC. There are other considerations, however, that will ultimately limit
performance, primarily that of noise, in both the amplifiers and the switched capacitors
themselves.

Switched capacitor noise


In switched capacitor circuits, a capacitor is first charged through switches, and then the
charge is delivered through other switches, usually to an integrator input. When switches are
closed, connecting the capacitor to a signal source potential, thermal noise (4KTRF) will exist
within the switch resistance, which will change the voltage across the capacitor from moment
to moment. At the instant the switches open,

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 347

that noise is effectively sampled and the capacitor potential will be somewhat in error. When
the capacitor is connected to the input of the integrator, a similar situation exists, with the
delivered charge being again affected by the switch thermal noise, which is again sampled
when the switch turns off.
While the switches are closed, the bandwidth of the noise across the capacitor is limited to
1/(2π × R × C), so if we make the switch resistance smaller (presumably generating less
noise), the bandwidth over which the noise is measured increases proportionately; the actual
switch resistance generates the noise, but the capacitor modifies it. In fact, although capacitors
do not produce noise on their own, the result of switches in conjunction with capacitors does
create noise, which depends solely on the capacitor’s value. This is called KT/C noise.
If a capacitor is shorted with a switch, when the switch opens the voltage remaining on the
capacitor is SQRT[(K × T)/C]. If the capacitor is then to be used in conducting a signal to an
integrator through a second set of switches, it will carry this noise component into the
integration, and when the second set of switches open, their noise contribution will also be
SQRT[(KT)/C]. The result of switching capacitors twice in conducting a signal (which is
always the case), gives rise to a total noise of SQRT[(2 × K × T)/C].
When considering switched capacitor systems, think in terms of noise charge instead of
noise voltage; the noise charge would be C times SQRT[(2 × K × T)/C], which equals SQRT
(2 × K × T × C). This may lead one to believe that reducing the capacitor size would lower
the noise charge, but remember that the signal charge to which the noise is referred is equal to
Vin × C. SNR improves with the square of capacitor size. As we recall that the effective
resistance of a switched capacitor is 1/FC, the switched capacitor circuit will have an input
resistance that scales directly with input capacitor value, and the SNR can be calculated by
evaluating the thermal noise of this effective resistance, and adding 3 dB to the noise
component to account for the dual-use of the input capacitor. Remember, however, that other
noises can contribute; the second phase of switching, that to the input of an integrator, will
involve input reflected amplifier noises too.
Since oversampled converter systems are operating at a high switching rate compared to the
sample rate of the baseband signal, a SNR improvement comes about as the result of
averaging. For a given signal bandwidth, doubling of the switching rate allows the use of half-
sized capacitors without suffering a noise increase. High-speed switched capacitor circuits are
therefore ideal for use in an ASIC, as the available capacitor values are limited, but the
switching and amplifying circuits are fast.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 348

Oversampled converter post processing


In the case of an oversampled ADC, the following decimator accepts the high rate single bit
data stream and produces a high resolution, multibit result at a much lower output sample rate.
The decimation filter provides two functions, one to remove high frequency noise components
from the bitstream and the other to limit the range of output frequencies to the band of DC to
half the output sample rate. If the decimator filter cutoff is sharp, the only anti-aliasing filter
required at the analog input will be one that cuts off at a frequency below the over sampling
clock rate, which is usually easily done with a simple, noncritical RC.
The decimation filter is usually a large FIR filter that correlates perhaps 1K to 4K
coefficients against an equal number of input bit values. Since the bits from an ADC
modulator have a −1 or +1 meaning (1 = +1, 0 = −1), the decimator can simply sum
coefficient values from a ROM directly, or in the case of a −1 bit value, the 2’s complement
of the ROM value. The 2’s complementing can be eliminated by interpreting the bit values
as 0 and 1, where the coefficients are simply gated into an adder, and the output is scaled and
offset accordingly. Decimators may be several stages long, but in any case are fascinating
puzzles to solve; the machinery must take in a fast bitstream and continuously output wide
words at the much lower output rate. Multirate machines such as decimators are mind-
boggling but beautiful.
Beautiful? That’s a sandbox assessment of electronic art.
The oversampled DAC will require a logic modulator to produce a high rate bitstream that
will be converted to the analog domain by a switched capacitor filter. The input to the filter
will simply be capacitors alternately charged to a reference and connected to the filter input in
a polarity that corresponds to the input bit-stream. The filter’s purpose is to remove high
frequency components from the bitstream, leaving a noise free baseband signal. The output of
a two stage low-pass switched capacitor filter can be averaged through a single continuous
time filter (R and C) to remove any brief switching glitches. When contemplating an
oversampled DAC, consider the order (number of stages) of the logic modulator, and the
effects of noise that will only be removed with difficulty through a simple switched capacitor
filter. In the ADC case, the following decimator can be made extremely sharp, but in the DAC
case, the following switched capacitor low-pass function can only be made sharp with great
difficulty, as it is low-ordered and simple.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 349

More clock phases


As promised, there are more clock phases to consider. If you really want the most from your
switched capacitor project, you should consider advanced clock phases as well as the normal
ones indicated earlier. To appreciate the need for advanced clocks, consider the following
circuit.

S1 and S3 conduct on one clock phase and S2 and S4 on the other. When the switches turn
off, charge is coupled from the switch drive signal to the switch terminals. This switching
feedthrough is signal dependent. Consider S1: the potential of the input will bias the source
and drain nodes of the N and P switches that constitute S1, and as their gate voltages switch to
turn the devices off, charge will be coupled differently through the two switches, depending on
the input voltage. If S3 turns off first, and only then does S1 turn off, the effect of this signal-
dependent charge is minimized. Also, notice the bottom plate of C1 is attached to the input
side. This stray capacitance will also help minimize the effect. The nodes at S3 and S4, the top
side of the capacitor, should have as little capacitance as possible to other nodes, including
ground. Never put the input signal into the top plate of the capacitor.
This suggests that we have several clock phases, and the means for generating them. The
process is simple, using a few logic delays and some gates. Figure 12.23 shows the simplest
scheme, where some simple delays (series of inverters) and gates provide the proper clock
delays and nonoverlapping quality.
The switch control lines should be generated in an electrically quiet area, with a clean VDD
supply—usually the same supply used to power the switched capacitor filter’s amplifiers.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
CONVERTERS AND SWITCHED CAPACITOR TECHNIQUES Keith Barr 350

Figure 12.23 Timing circuit for deriving nonoverlapping clocks for use in switched capacitor
processes.

Please do not be put off by the apparent complexity of fully differential switched capacitor
circuits; although an entire book could be written on the subject (and have), the basic ideas
herein are really all that is needed. That, and some serious time devoted to various simulations,
will reveal the details and provide deeper understanding.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
Source : ASIC Design in the Silicon Sandbox Keith Barr 351

Packaging and Testing


When contemplating a project, an important concern is package type and pin count, and
it’s doubtful that your minimized design will have exactly the same number of pins that
packaging houses make available. No doubt, an extra GND or VDD will get tossed into
the design, or an extra pin to help with test may find its way in toward the end of the
project. Once pin count has been established, you need to make sure your die will fit the
package, which is difficult to know until the design is done. Further, you must place your
bonding pads according to the leadframe connections, so that long or extreme angle
bonds will not be necessary. This means you must finish your design before you can
finish your design…. Obviously, the process will start with one package in mind, but
perhaps move to another as the design becomes clearer (or the die becomes larger, which
causes a major change in plans!).
SMT packages of the dual-row type, as the SOIC, TSSOP, and so forth are very
convenient, as they can be packaged in tubes or tape and reel, but the larger quad-flat
packages cannot be packaged in tubes, which may increase handling costs. The newer
QFN packages have no leads extending from the package, but the die size is usually
limited to small devices. Ball grid arrays and flip chips are difficult to work with in the
prototype stage, but deliver the ultimate in packaging density.
For low-cost products, you can have assembly houses (not packaging houses) mount
your die onto the product PCB and leadbond the devices to PCB traces directly, called
chip-on-board (COB). For this, they will require hard gold plating (gold over nickel) on
the PCB, at least in areas where bonding is to be performed. Usually such assemblies can
be tested after leadbonding, and are cheap enough to be thrown away if a failure is found
in test. The die is encapsulated onto the board with a very hard epoxy “blob,” and the
entire process is quite cheap, usually under half

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PACKAGING AND TESTING Keith Barr 352

cent per lead. The PCB pattern can usually abide by minimum PCB layout rules, which
should be acceptable to the leadbonding process.
Packaging houses are about as picky as class 1 fabs when it comes to distributing
information. I suggest finding a packaging house early on in the project, as they all have
their own leadframe and molding tooling. There is no standard package; they all may look
the same from the outside, but the punched leadframes are a bit different from company
to company. The packaging house will have invested in its own proprietary leadframe
designs, each to accommodate different-sized die, and are very protective of the
leadframe drawings, to an extent that can be frustrating to a new IC designer. Information
on leadframes is crucial to a new project, but getting the information so that you may
place your pads with confidence is difficult. Be sure to get the intended package
leadframe drawings before beginning the project, and perhaps a few variants so you have
options as the project proceeds.
In the experimental stage, when contemplating a new project, you may simply obtain
standard parts that are packaged the way you imagine your new ASIC would be, and with
the sandpaper technique, grind down to the leadframe to make measurements and at least
see what’s possible. Why packaging houses would protect information that can be
obtained with a piece of sandpaper is beyond me, and no reasonable explanation has yet
been offered.
The packaging process is highly automated and due to fierce competition, quite
inexpensive, at least for standard packages. The wafer will be received by the packaging
house, background to about one-third the original wafer thickness, and diced with a high-
speed diamond saw into individual die. The die are mounted onto leadframes, bonded out
with gold wire, encapsulated with a high temperature mineral-filled thermosetting
molding compound and marked with a laser or an ink stamp. A pessimistic rule of thumb
for the entire process is about half cent per pin plus maybe 4 cents per part. The cost of
tubes or tape and reel packaging is separate, but usually under 1 cent per part.
It is amazing that any company would accept extremely fragile 8 in. wafers of single
crystal silicon that have been processed on the top side with multiple layers of submicron
details and grind the back side off to a fraction of the original thickness for under $10. It
is conceivable that one mishandled wafer could wipe out the entire year’s profits.
Actually, the equipment and wafer mounting technology is well established, which allows
this to be a minor part of the packaging process.
Dicing is performed by a circular saw. This consists of a very thin metal ring that is
embedded with diamond dust and clamped into a precision chuck supported by air
bearings that rotates at about 30,000 RPM. The wafer is mounted on a flexible, adhesive-
backed plastic material and held in a vacuum chuck during dicing. The wafer die are on

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PACKAGING AND TESTING Keith Barr 353

regular centers, so the machine can be programmed to make cuts automatically at the correct
pitches on the X and Y axis independently. Cooling water is applied during the cutting
operations. The cuts are made through the wafer and slightly into the plastic backing.
The width of the dicing cut can vary, but is usually on the order of 35 to 50 μ. This allows
ample room down a 100-μ street between die for some maladjustment. The foundry will have
placed their PCM structures between your die, and these features often require somewhat more
than the 100-μ spacing that exists where PCM structures are absent. From knowledge of the
wafer layout, the PCM structure widths and the width of the cutting saw blade, a final die size
can be calculated, which will be important when fitting the finished die into the chosen
package.
The flexible, adhesive-backed plastic support used during dicing can then be used to carry
the die into the automatic placement machinery, where the plastic backing is stretched to a
significantly larger diameter than the original wafer, allowing increased space between die.
Here, the die are chosen (avoided if marked with a wafer-probe reject ink dot), and
automatically picked and placed onto leadframe pads. The leadframe pads have been
previously prepared with small dots of epoxy, which will bond the die to the leadframe pad.
Silver particle-filled epoxy is available with the thought that this will make good electrical
contact between the die and the pad. However, measurements on the epoxy’s resistivity are
done using highly conductive electrodes but the backside of the silicon wafer is far from a
good conductor. As the silver particles will be only contacting the silicon die at random spots
across the backside of the die, do not expect a good electrical connection, unless perhaps the
wafers were fabbed on epi substrates.
Leadframes for high production are punched from a continuous strip of metal with a high
precision tool that can be very expensive (shown in Figure 13.1). Provided the tool is made
from extremely hard materials, its lifetime can be in the millions of units, which leads to good
per-part economy. A high production, 208 pin leadframe punching tool can cost nearly a
million dollars, but provide 50,000,000 parts in its lifetime, which is maybe one year.
If a custom leadframe is required, it can be produced through an etching technique, although
the per-part cost is high compared to punched leadframes. The cost of tooling for etched
leadframes is in the range of $5000 to $10,000, which includes a tool for down setting the pad
area. This is so that the surface of a backlapped, mounted die is at the same level as the
leadframe elements that the part is bonded to, and leaves more room for lead wire “looping”
above the die.
The center pad area of the leadframe must always be larger than the die that is mounted on
it; usually 100 to 150 μ is required all around. The minimum leadframe widths and spaces can
be provided by your

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PACKAGING AND TESTING Keith Barr 354

Figure 13.1 Leadframe, as punched strip and in finished package.

packaging house for custom leadframe development, and they can coordinate the entire
custom tool manufacture from your full specifications or just your general intentions.
After leadbonding, the parts are manually placed into a steel compression molding die,
usually in strips that contain multiple bonded sites. The tool is closed, placed into a hydraulic
press, and heated. A pre-heated “puck” of mineral-filled thermosetting epoxy is inserted
into the press, and is heated to a high temperature, whereupon it flows with a low viscosity
and under high pressure into the mold cavities. After tens of minutes, the parts can be
removed, although they will require a postmold bake period of several hours to fully cure the
molding resin. The tools for this molding operation are designed for a particular package’s
outer dimensions, and can be used with custom-designed, etched leadframes as well as the
standard leadframes, so that a custom bonding arrangement inside the package can make use
of already available molding tools.
Finally, the parts are cut from the leadframe strips with a tool that shears the leads from the
leadframe carrier and forms the leads to the final shape. Custom leadframes can be designed to
share this tool as well.
After postmold baking (and sometimes before), the parts are marked with the customer’s
logo, part identification number, and lot code, through one of two primary processes: a pad-
printing technique or laser marking. The pad-printing process requires a simple tool that must
be changed as date codes change, but it is inexpensive, and the pad-printed images are clear
and easy to read. Laser marking only requires a simple programming step, but the final printed
image is not as easily readable.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PACKAGING AND TESTING Keith Barr 355

Parts can be then packed out into tubes in the case of dual row packages (perhaps the least
expensive method), or placed into trays in the case of quad-flat parts. After testing, the parts
may be put into tape and reel cavities, tubes, or chip carriers for final shipment. If test is
performed within the assembly facility, and the parts are of a dual row type, they may be put
into transient metal tubes for conveyance to the test facility; quad-flat packages will need chip
carriers, which require robotic handling.
The packages of most interest to sandbox players would be the dual row family, such as the
SOIC and TSSOP, the various quad-flats, and the newer packages of the QFN variety. This
latter type is extremely small, shows very low lead inductance and capacitance, but does not
support a very large die unless high pin counts are accepted. The whole thrust of the sandbox
notion is that silicon is pretty cheap, much can be done with older technologies at a low
tooling (mask) cost, and when yields are considered, even fairly large die (up to 100 mm2) are
quite feasible. Further, that if your project is small in terms of production rate, a small die will
result in very few purchased wafers and a poor foundry relationship; larger die should not be
cost prohibitive at a low production rate, and much circuitry with grander features can be fit
into larger die, even in an older technology.
If you can sketch out a leadframe with 0.2-mm lead widths and 0.2 -mm spaces, it can most
likely be tooled as a custom part, and just as likely will be already available as a standard one.
There are, however, certain rules concerning how little the leads project into the package, as
insufficient packaging material around the lead ends will cause the part to be less reliable
under thermal stress.
A few sample packages and maximum die sizes that may help in planning projects (check
with supplier for exact details):

Package Maximum Die Size (mm × mm)


SOIC-8 2.3 × 3.75
SOIC-14 2.3 × 4.0
SOIC-16 3.8 × 5.6
SOIC-20 3.8 × 5.6
SOIC-24 4.3 × 5.6
SOIC-28 4.8 × 5.6
TSSOP-20 2.8 × 3.9
TSSOP-24 2.8 × 4.7
QFP-44 7.6 × 7.6
QFP-64 8.8 × 8.8
QFP-100 10 × 10

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PACKAGING AND TESTING Keith Barr 356

QFN-16 (3 mm2) 1.6 × 1.6

QFN-20 (4mm2) 2×2

QFN-28 (5mm2) 3×3

QFN-48 (7mm2) 5×5

QFN-56 (8mm2) 6×6


SOT23-8 1 × 1.5

This last package is very attractive for very small projects, loadable into tubes, and easily
assembled. The footprint is 3 mm2 and is basically the standard SMT transistor package. For
the sandbox practitioner, however, the expected quantities would need to be huge, as a single 8
in. wafer can deliver over 25,000 1 mm2 devices.
An alternative to standard packages could be the flip-chip approach, which comes in two
basic styles, solder ball and gold bump. These are basically bare die, not backlapped, which
can be applied directly to the target circuit. In the case of solder-balled parts, the IC pads are
plated to accept tiny solder balls that can be reflowed to the final circuit board. The solder
balls actually space the die somewhat from the PCB due to the rather high surface tension of
solder alloy, and an epoxy underfill must be applied to provide additional bonding strength, as
shown in Figure 13.2.
The gold-bumped parts are used for connection to the indium-tin oxide (ITO) transparent
conductors on an LCD display. In this case, the pad areas are selectively electroplated with
gold to a closely controlled thickness of around 15 to 20 μ through a removable-mask process.
The gold bumps project from the surface of the die, and make contact to the ITO through
anisotropic conductive film (ACF), which is simply an adhesive with conductive particles
sparsely distributed within. The conductive particles are actually elastic, polymer spheres of a
closely controlled dimension (approx 5-μ dia.) that have been coated with nickel and gold.
When the die is pressed against the adhesive/particle mixture, the population of conductive
balls is such that it is extremely probable that many conductive balls will be trapped between
the gold bumps and the ITO traces on the LCD glass. The excess adhesive will squish out
from under the die, and the remaining conductive balls are sufficiently sparse

Figure 13.2 Solder-bumped die as applied to PCB.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PACKAGING AND TESTING Keith Barr 357

that any possible short between two different connections is extremely improbable. The curing
of the adhesive with UV light through the LCD glass completes the connection process,
yielding a very robust and permanent connection.
ACF is supplied as a refrigerated material of well-controlled thickness in strip form, which
requires careful alignment between the die and the glass substrate, and the controlled
application of heat and pressure to cure the adhesive as shown in Figure 13.3. It is not cheap,
but nonetheless quite competitive with other die attachment methods, especially for LCD
applications.
The cost for attaching solder balls or plating gold bumps is in the range of $150 to $200 for
each 8-in. wafer. After attachment of balls or the selective plating of gold, the wafers are diced
and packaged into waffle-packs, plastic carriers that come in a variety of cavity sizes. Waffle
packs are usually 2 or 4 in.2, with a lid that can be clipped on to allow safe shipment.
Pads for gold bumping can be set on a 60-μ pitch, with each bump approximately 40-μ
wide. As the electronics industry becomes affected by the relatively new notion that all
electronic products must be lead free, vendors are working on less-expensive methods of
bumping wafers with more cost-effective materials. At present, the relative softness of gold
seems to find favor in contrast to other, more cost-effective, materials.
However convenient in production, solder ball and gold-bumped die are extremely difficult
to work on in the prototype stage. These parts may require special bonding into test packages
if difficulties are found with the circuitry. The high pin counts that these circuits support make
prototype packaging for FIB modification and functional analysis difficult.
The design of LCD drivers is a special case, where a large number of I/O pads are required
on chip, but little internal circuitry. Best economy is obtained by making the aspect ratio of
the die large, which allows a large die periphery (to accommodate a large number of pads), but
a limited die area (to minimize cost). Organizing the pads with close pitch and a nonsquare
geometry allows better probability of contact through the ACF. Packaged parts are rarely
beyond an aspect ratio of 2:1, whereas an LCD driver die could measure only10 mm by 2 mm
and support 368 pads.

Figure 13.3 Conductive spheres in acf film connecting die to PCB.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PACKAGING AND TESTING Keith Barr 358

Prototype Packaging and First Silicon


When first silicon returns, it is most beneficial to have the parts packaged in ceramic or
specialized plastic packages, so that the test die can be exposed for subsequent tests and
potential modifications. You will find these services to be quite expensive. The simple dicing
of a wafer can cost several hundreds of dollars, and the packaging of parts can be on the order
of $50 each. Even the simplest project can cost several thousand dollars to get from the wafer
level to a dozen packaged parts. The use of open cavity packages though allows probing with
micro manipulated tungsten pins at a probe station, microscopic investigation of the finished
die, and the focused ion beam (FIB) modification of your circuits.
If feasible, prototype packages should allow for easy attachment to test boards and enough
room for extra parts to be experimentally attached. It can often take significant time to “get
the project up,” depending on the device’s internal complexity. The prototype test setup
should have the ability to vary supply voltages, clock speeds, and device temperature. On this
last point, a soldering iron can quickly give a rough indication of high temperature variations,
but low temperatures through the use of freeze spray can cause condensation problems,
particularly if high impedance analog connections are present. A small office refrigerator can
come in handy, with a plastic bag surrounding the project, sealed around the attached system
leads. During initial tests, unusual arrangements are often welcomed in the sandbox test lab.
Prototype parts are completely unknown; that is, they are untested, and may contain serious
design or fabrication flaws. As a result, precautions should be taken to ensure that equipment
will not be damaged in the case of severe logic faults or in particular, a dead short across the
power supply. Every effort should be made to bring up the analog support circuits first,
making sure clock frequencies and regulator potentials are correct, before going on to higher
level functions. Particularly with open cavity packages, it is important to consider the
device’s sensitivity to light. I had a project once that showed a constant offset that couldn’t
be explained, until the bench light was turned off, after hours

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PACKAGING AND TESTING Keith Barr 359

of trying to get SPICE to explain it. Wring out problems with the support circuits first, and
then slowly work toward full chip functionality.
I like to “sniff” at the chip in the early tests, observing analog circuit sensitivity to things
like supply noise and bypassing; watching every pin with an oscilloscope and noticing things
like supply noise or ringing on an analog pin. Only when the support circuits are in good order
or at least their peculiarities are well understood can you go on.
If problems are found, they can often be corrected without waiting three months for the next
revision, through the use of a focused ion beam (FIB) service. My experience is that most
problems are simple in nature and easily fixed by FIB, or (occasionally) so profoundly awful
that a second revision is, regrettably, the only next step.
The FIB process uses a device organized very much like a scanning electron microscope
(SEM). The part to be modified, in an open cavity package or in raw die form, is placed in an
evacuated chamber and scanned with an electron beam. An image can be obtained through the
detection of electrons, backscattered from the work piece. The scanning electron beam will
have great difficulty revealing features under the top silicon nitride layer, so most FIB
operations are done based on the part’s GDSII file, which shows the exact location of
features, automatically aligned with easily recognizable top-level patterns, such as bonding
pads.
When requesting a FIB operation, you simply specify what traces are to be cut and what
connections are to be made, as X and Y coordinates from your GDSII file. As an example: Cut
M3 at 2203.85,4566.05; jump from M3 at 2213.45, 4552.40 to M3 at 2203.85, 4567.40.
The FIB machine operator can substitute positive gallium ions for electrons, reverse its
accelerating potentials, and use the ion beam for milling into the various layers, introducing
tiny amounts of various low-pressure gasses directed at the IC surface. The selection of gasses
and the adjustment of beam current will allow the etching of insulation or metal, or the
deposition of insulation or metal, usually platinum or tungsten. The process is highly
automated, and is surprisingly inexpensive, at least for simple operations. The cost for the
process is entirely based on machine time. In the simple example above, the cut and jump
traces are at the top metal, which is easiest to access, and the jump trace is short; the machine
time may be on the order of 20 minutes, at about $400 an hour. This is cheap, when the
alternative is to wait three months for the next revision. At a minimum, FIB repair can keep
the project going so that the second revision can be expected to be production silicon.
Very long deposited runs or large areas can become quite expensive, so it is important to
carefully plan the simplest FIB possible to do the job. Probe pads can often be usefully
employed, so that you don’t run the risk of damaging the chip by attempting to scrape
through the nitride layer with the 1-μ tip of a tungsten needle under the microscope; it’s easy

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PACKAGING AND TESTING Keith Barr 360

to rip right through top metal traces, once the nitride layer is broken. Probe pads as small as 5
μ2 can easily be hit with a micro manipulator for brief tests, and the cost to deposit them by
FIB is quite reasonable. Pads that are 10 microns square, however, cost four times as much.
I began this book by jumping right into microscope issues, as it’s difficult to work with IC
designs without one. If your microscope has an X–Y measurement table and a 50 power
objective with at least an 8 mm working distance, then you can easily build it into a probe
station, although life will be much easier with a larger working distance, even if the objective
power is lower. Purchase a 0.25 or 0.375 in. thick steel plate that’s as large as will fit onto
the X–Y table, and fit it with some nice cork or rubber pads so that it sits on the X–Y table
without shifting about. Purchase some small, strong magnets and epoxy a short standoff to
each so that the magnets can be placed anywhere on the plate and your test board can be
tightly screwed into the magnets. When building the test board, be sure to remember that it
may have to go onto the probe station, and at these microscopic levels things must be built
sturdily. Make the board as small as possible and allow rather close spaced mounting holes
near the device. Also, position the part close to one edge of the test board, so that probe arms
can have unencumbered access.
Probe manipulators are not cheap, but they are nicely built; you only need one or two, and if
cared for, they will most likely last until silicon is no longer fashionable. At least one could be
an active probe with low loading capacitance, but many tests will be to simply load a line
(slowing it down) to see if a race condition is responsible for the chip’s problem. Also, many
probe operations are to obtain DC voltages or to provide some stimulus as an experiment; in
all of these cases a standard probe with a simple wire connection to the probe tip is all that is
necessary. Simple micromanipulators cost from $1000 to $2000 each and have magnetic bases
that quickly find their way solidly onto your steel plate. Be advised, most grades of stainless
steel, however pretty, are not magnetic. If it’s rusty, it’s likely to be nicely magnetic, and
can be cleaned up with sandpaper, which is an important tool in the sandbox.

Production Testing
Wafers may be probed at the foundry, and bad die may be marked so they will not be
packaged later on, but the economy of this process in an age of low-cost packaging and high
yields is suspect; all parts must be tested yet a second time after packaging to sort out
packaging failures. Further, the wafer probe step requires a carefully built probe card that will
have unavoidable lead inductances, potentially leading to false failures.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PACKAGING AND TESTING Keith Barr 361

Post package test, however, is of course required, and must be carefully considered. If your
packaging house can perform the final test, the “one-stop-shopping” can often work to your
benefit.
The equipment used for IC test can range from a small customer-built PC board to a
multimillion dollar at-speed logic tester with hundreds of connection lines, 150 ps resolution,
and precision analog analysis circuitry. Perhaps it would be a rack of equipment transiently
gathered for that particular test. In any case, parts must be reliably handled and properly
sorted, which in itself requires a relatively simple but expensive, high reliability automated
handling machine.
Each part design will have a load board made for it that usually contains a replaceable
socket that interfaces with the device under test (DUT), plugged into another socket for rapid
exchange when the DUT socket inevitably fails or becomes unreliable. The wear on test
sockets is easily underestimated when your experience as an engineer is limited to testing
perhaps a dozen chips; the quality of these sockets must be extremely high.
The load board will be made to fit a particular tester, so the first consideration is which
tester to use. Conversations with the test engineers at the test house should be able to quickly
zero in on the expected level of sophistication required, which is complicated by the need to
balance test functionality and accuracy against cost. The cost of test depends on the cycle time
of parts through the machine and the initial cost of the machine; the test house must recover
their investment in their extraordinarily expensive equipment.
A modern, expensive tester that can cycle through test vectors quickly, although more
expensive on a per-hour basis than an older, slower, and cheaper machine, may in fact deliver
a more economical test. On the other hand, mixed signal devices may be split out into two
separate tests on individually inexpensive machines, or combined into a single test on a more
expensive single machine. The options are so broad that only one statement can be made:
testing is rarely cheap.
A feasible alternative is to build your own test setup, specifically built to test the
functionality of your device. In this case, you must remember that the test engineers will need
multiples of the test setup, so that if one fails, another can go online immediately. Further, if it
is built by your own engineering department, then you will be responsible for its care and
adjustment, whenever necessary. The cost of chip-handling equipment (without a logic or
analog tester) is also high, but it’s “rental cost” is fairly low compared to a state- of-the-
art test system. The time for parts to step through the machine (while a test cannot be
performed) is fairly short, on the order of 1 second for a gravity fed machine. It is conceivable
that if your test is simple and can be accomplished in 0.5 seconds, then the cycle time could be
1.5 seconds per part, at a cost per

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
PACKAGING AND TESTING Keith Barr 362

part of perhaps two cents. Compared to elaborate testing procedures on modern equipment,
this is really cheap.

Test Vectors
Logic circuitry can be tested by stimulating the part with known input conditions and strobing
the outputs at specific points in time. Maximum propagation delay through the part can be
measured in this way. Unfortunately, the strobed logical outputs must match the input file
exactly, and the circuit must be initialized to a completely known condition prior to test.
Therefore, if the ASIC contains a crystal oscillator or a PLL, these must be bypassed for a
proper logical test, as all stimulating signals must come from the tester.
To some extent, logic testing is at odds with the “anything goes” sandbox approach,
where test is a secondary consideration. Sorry, this is where imagination meets reality. As
discouraging as such practical considerations are, be sure you can thoroughly test the parts you
design, through some means.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
Source : ASIC Design in the Silicon Sandbox Keith Barr 363

Odds and Ends


There are always dangling subjects that linger in an introductory book like this, which
don’t fit in anywhere else, so I’ve reserved this chapter for sweeping up the floor and
making use of what I find. I suppose the first subject to address after the nuts and bolts of
packaging is that of leadframe inductance.
Lead inductance is something that plays an important role in circuit performance,
especially in high-speed mixed signal circuits. The switching speed of submicron logic
can cause extremely sharp current pulses through the VDD and GND pins.

Flip-chip techniques allow the IC to directly attach to PCB traces, and GND and VDD
inductances to inner PCB layers can be very low. Flip-chip techniques, either the
soldering of the part directly to the PCB via plated “bumps” on the pads, or through
anisotropic conductive tape (which is quite pricey), does not allow you to probe the part
in development. However, leaded packages, which will continue to be common for

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 364

ASIC designs, have significant lead inductance. Assuming 6 nH for lead inductance to a
low impedance supply in the application, on both the VDD and GND pins, and a 500 pF
effective capacitance of device gates and well connections, and series resistances, as
would be expected in both bonding wires and power/ground distribution traces, a series of
memory precharge pulses from on-chip can be simulated roughly in SPICE.
Figure 14.1 shows the on-chip GND and VDD potentials, relative to the external
system ground. Imagine sensitive circuitry working in such an environment. The supply
variations are the result of leadframe and bondwire inductance and resistance, and the
resonance is due to that inductance resonating with the on-chip capacitance.
The resonant frequency of the leadframe/bondwire inductance and the on-chip logic
capacitance yields a resonant frequency of 1/(2 × π × SQRT(L/C), and the reactive
components have an impedance of SQRT (L/C) at resonance. If the resistances involved
are very low, due to a well-contacted substrate and perhaps the use of epi wafers, the
ringing will be more extreme. If the resistances in series with this resonant circuit are on
the order of the reactance values or higher, the oscillations will become better damped. In
this case, the resonant frequency is about 65 MHz, a frequency that radiates well from
PCB traces. Notice that the “ground bounce” will couple into every I/O pad, and be
present on every signal line—even signals that are inputs to the chip.
The addition of MOSCAP devices across the supply on chip can significantly improve
the ringing, by lowering the resonant frequency and causing the reactance of the
leadwires to be lower than the resistances in the leadwires and chip metallization. If every
extra space, under

Figure 14.1 VDD and GND potentials on-chip due to current pulses and lead
inductance.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 365

wiring and in between bond pads, is occupied by MOSCAP features, a significant


improvement can be obtained. The following SPICE simulation shows the addition of 2 nF of
added MOSCAPs.
Figure 14.2 shows that the addition of the internal bypass caps significantly reduces noise
on the internal VDD and GND lines. When such capacitors are included on-chip, often with
no increase in die size, the performance of the ASIC can become quiet so that sensitive
circuitry is more easily included, and radiated RFI from the end product is dramatically
reduced.
I strongly suggest MOS bypass capacitors in your design. You will be amazed at how quiet
the resulting product will be, except for the other parts in your product that weren’t designed
with this consideration. If the target product only contains your ASIC and a few passive
components, you will be able to get by with a one- or two-sided PCB and more easily fit
within RFI emission rules.
If the design is core limited, build a MOSCAP cell that fits between the pads and attaches
directly between the power busses around the die. Keep gate lengths in the MOSCAPs no
longer than maybe 5 μ and calculate the resistance of gate poly so that the total poly resistance
of all of such capacitors equates to a total gate poly resistance of less than 1 Ω. Ensure good
substrate contacting between each MOSCAP structure.
There may be areas within your circuit where bypass caps can be designed in, such as under
a signal distribution bus. In a three metal process, the bus can be run in M2, connections can
come out of the bus to circuit blocks in M3, and MOS capacitors can be built underneath,
connected to VDD and GND with M1.
Finally, in most packages, some leadframe paths are longer than others. Always choose the
short paths for supply and ground, never the

Figure 14.2 On-chip VDD and GND potentials with integrated supply bypassing.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 366

long ones. If your design is in a TSSOP package, place power and ground pads in the middle
of the pin rows where the bond wires and leadframe elements will be as short as possible.

GND and VDD Distribution


The current impulse drawn between supply and GND is particularly nasty in large memories; a
64K bit SRAM may draw 500 mA for 1 ns, which is difficult to feed with thin metal traces.
The average current may be very low, but the peak current is extreme. As you draw power
busses around the memory, try to calculate the metal resistance as you go; imagine the path
the currents will take as they find their way to MOS bypass caps and the VDD/GND pins. In
fact, design the memory with this problem in mind at the start, with a plan as to how the
memory will fit in the final design. Do not try to make the die smaller by restricting the size of
power busses. Good power connections can be an integral part of the memory design itself.
For speed considerations, such a large block would normally be divided into sections, and
solid busses can surround each block.
A note to the switched capacitor designer: In applications where large memory blocks are
divided into sections, with the intent that only one block at a time needs to be accessed,
remember that the substrate noise at your switched capacitor circuit will depend on which
memory block is accessed. Substrate disturbances that are identical on each switch opening is
a condition, while substrate disturbances that randomly vary on each switch opening
constitutes a noise. Each block of memory will reside in a different location on the chip, and
have different paths through which current is supplied; therefore, each block will influence
switched capacitor circuitry differently. In such cases, I suggest accessing and precharging the
entire array at once, and only transferring data to or from the selected block. Overall power
consumption is higher, but SC noise is lower, which may be more important.
Wherever possible, make use of the peripheral bus (seal ring) to both carry ground currents
and contact the substrate. Allow for metal runs (all layers with vias between) to occasionally
run between pads from the peripheral bus to the inner ground strip that lies between the
padframe and the core. A full metal stack in the peripheral bus, which is reasonably wide, can
offer a lower resistance, increasing your project size by maybe only 50 μ.
Especially in mixed signal designs, the use of multiple ground pads helps keep the substrate
at a quieter potential. Multiple GND and VDD pads are required near pads that drive fast logic
signals out to external capacitive loads, in which case you may require a GND pad and a VDD
pad for every two to four signal lines, depending on their driving aggressiveness. From a
switched capacitor circuit point of view, differential signals coming into the device will suffer
common mode noise due to substrate

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 367

bounce, which can be minimized by the use of multiple ground pads, particularly on each side
of the differential input pads. Look at it this way: A3 mm2 project with one ground pad and
good power rails with a solid seal ring may have a metal resistance of 1.6 Ω from the ground
pad position to the opposite side of the die; placing a second ground pad at that opposite side
now makes the worst case resistance to the midpoint of the ends of the die, which is roughly
on the order of 0.4 Ω.
In the case of switched capacitor ADCs and DACs, you should establish an analog VDD
that is separate from the digital VDD. Of course, ground is common to both, so I suggest
placing your switched capacitor circuits in an area where logic ground currents flow the least.
This can be accomplished by routing logic circuitry on one end of the die, leaving the more
sensitive analog circuits at the other, with ground pads at each side of the die. In the case of
converters, it is handy to use supply and ground as reference potentials, and you most certainly
can, but never the ones on chip. Bring in reference pins as signals through their own bonding
pads; in application, these pins can be connected to a well bypassed supply and ground, but
not on-chip. The supply and ground lines within the IC will be very noisy.
If your design is intentionally pin-limited, better on-chip references can be obtained from
the VDD or GND pins by bonding two adjacent pads to a common leadframe element; for
example, one for the GND connection and one for a more “quiet” ground. The voltage drop
across the bonding wire is significantly greater than that across the short leadframe element.

The MID Supply


Analog circuitry can benefit from a mid-supply pin, intended to be bypassed in the application
with an external capacitor to ground. This supplies a perfect bias potential that can be
relatively immune to substrate or VDD transients. Place a pair of bias resistors across the
supply within the design, and rely on the external cap to keep the mid-supply terminal stable.
Amplifiers on-chip can be made to have common mode ranges that include ground (using
PMOS input differential pairs), or supply (using NMOS inputs), but amplifiers will often show
slight input offsets as the common mode potential is varied. In your analog circuit designs,
always attempt to make the circuits so that one amplifier input is attached to the mid-supply,
presumably the + input, and arrange your circuitry so that the two amplifier input terminals are
approximately (if not exactly) at the mid potential.
The mid potential is a signal, not a supply. If a buffered version is required, a simple
amplifier can be built to provide a low impedance version of the mid signal. Be sure to design
any mid-supply buffer so that it can deliver the peak currents that may be required. In
switched capacitor

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 368

designs, this could be significant, but remember: The supply only needs to be stable and
correct at the moment the switches open! At other times, the mid-supply can show
considerable transient variation.

Substrate Connections and Sensitive Circuitry


The substrate is presumably contacted around its periphery, and in numerous places within the
IC, to prevent local latchup conditions. The ground connections to these substrate contacts
often have logic device source currents flowing through them, and lengthy ground lines to the
actual GND pads. This causes the substrate, in general, to have a noise component, even if
ground connections could be made through the leadframe with zero inductance. The substrate
that sensitive circuits are built within will always have this noise in the background. When you
place substrate contacts within sensitive circuits, say, a switched capacitor amplifier, the
potential of the local substrate will be different from the ground potential you are attempting
to contact it with. This means that potential gradients will exist in the immediate vicinity of
the substrate contacts.
Figure C14.1 in the color section shows an example of a differential pair of transistors
surrounded by substrate contacts.
This is an NMOS pair of transistors, four devices in parallel representing each device. The
transistors are interleaved; that is, the first and second and the fifth and sixth poly strips belong
to one transistor, and the second and third and the seventh and eighth poly strips belong to the
other transistor. This approach to layout gives the best chance of transistor matching. Around
the outside is a P-implanted diffusion that acts as a substrate contact. In a noisy IC, logic
circuitry will cause the substrate bulk to be at a potential that is (from moment to moment)
perhaps tens of millivolts different from the local ground connection that would connect to the
substrate contact. The potential of the substrate in the center of the differential pair will be
different from the substrate potential at the edges of the pair. When designing low-noise
amplifiers, keep some distance from the substrate contacts, or at least make the design as
balanced and uniform as possible. Also, keep some space between the diffpair (with its
surrounding substrate contact) and other circuits that have their own substrate contacts, to keep
the substrate potential under the differential pair as uniform and balanced as possible.
Many books talk about common centroid layout techniques, which I have never found
valuable. The interleaved transistor approach shown here is the easiest to draw, best
performing, and most likely to be dynamically balanced layout solution. The most common
problem is nonuniform substrate potentials that allow substrate noise to contaminate signals.
To achieve a better balance in the layout shown, I would extend the distance between the
substrate contacts and the NMOS devices a

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 369

bit, perhaps a micron or two, and provide more space between the bottom input metal 1 strip
and the substrate contact metal 1. Also, a wider substrate contact area, perhaps 2 contacts
wide, would be preferable.

Power-Up Circuits
When power is applied to your ASIC in its target application, some means of reset is required
to ensure a known starting condition. You may wish to reset counters, clear memory, and
generally establish a known condition. Further, if a crystal oscillator is involved, some time
may be required for it to come up to a stable operation. For this you will need a power-on
detector, which could easily be a bandgap reference or a crude variant of the bandgap.
The application will most likely have bypass capacitors across the supply, and as the power
supply current that charges this capacitance is limited, you can expect that at the ASIC power
terminals the rate of rise will be in the order of a volt per microsecond or so. This is ample
time within which to detect that power is being applied, but perhaps insufficient for full
operation. Once the supply voltage is established within some preset limit, a simple counter
can be used to measure out some number of crystal oscillator cycles, and device operation can
begin. The circuit can be a bandgap reference to monitor the supply accurately, or a simple
start-up circuit can produce a low output that only goes high when the supply is sufficient to
run logic circuitry, as shown in Figure 14.3.
This circuit is small and simple, and will ensure that the supply is at least an NMOS
threshold plus a PMOS threshold potential, allowing

Figure 14.3 Start-up circuit that produces a high output once VDD is past a minimum
threshold.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 370

registers to be reset properly. The output will be high during normal operation, which is why I
named it SUP, for supply-up. The SUP signal can be used to reset a counter that is clocked by
the crystal oscillator, which in turn can produce a RUN signal once the counter’s period has
elapsed. In the case of an XT crystal, this should be 10 s of milliseconds, and in the case of a
watch crystal, perhaps half a second.

The Schmitt Trigger


The division between analog and digital circuits is fairly clear, but at the juncture, is the
Schmitt trigger. Not really an amplifier, a gate, or a simple inverter, the Schmitt trigger spans
the gap between these two worlds. Whenever an analog signal that has a limited rise or fall
time must be brought into a logic block as a digital signal, the Schmitt trigger becomes
necessary.

The Schmitt trigger is an inverter constructed with pairs of transistors, and feedback devices
(driven by the output) that force the output to one extreme or the other, despite the voltage at
the input. The Schmitt trigger displays hysteresis; that is, when the input is low, the output will
be high; as the input is made more positive, a positive

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 371

threshold point is reached where the output will fall to a fully low logic level. When the input
is then brought to a lower potential, another point, the negative threshold, is reached where the
output will go solidly positive. The difference between these two thresholds is the hysteresis
potential at the input.
The threshold values can be adjusted by the sizing of the devices. Typically, the inverter
devices are identical, with the P devices perhaps twice the width of the N devices, providing
thresholds that are nearly symmetrical about mid-supply, and the feedback devices are sized to
determine the magnitude of hysteresis. All devices can be constructed as the same size in
noncritical applications, which is normally the case. Be aware that despite the full logic output
of the Schmitt trigger, the device will draw supply current when the input is at a potential
other than a full logic level. If the signal is not a particularly fast one, or time critical, design
the Schmitt trigger with long and narrow gates to minimize power consumption when the
device is presented with analog signal levels.

Testing the ASIC in Production


Circuits like the start-up circuit shown earlier may be required during normal operation, but
are a problem during tests. When the part is tested, it will be automatically handled by
machinery that will place the part in a test fixture, apply known signals and check the resulting
outputs for correctness. For analog signals, a rack of analog test equipment is used, applying
signals and measuring outputs that must be within specified margins; when the part contains
logic, however, the situation is very different.
Logic circuitry is tested by the application of a reset signal, so that all circuitry is in a
known condition, and clock and input conditions that stimulate the logic circuit are applied
sequentially to produce the expected output logic levels. This process is not “soft” as is the
case with analog testing; the circuit will not have a crystal attached, as the time for the crystal
to come up to speed is long, and the phase relationship of a crystal oscillator would be
incompatible with a logic tester. Logic testers cannot base their stimulus pattern on a signal
coming from the part under test; they must produce all of the stimulus signals and either
accept the part if it’s outputs are correct, or reject it if not. As a result, you may have to
provide extra pins for the logic tester to work with.
If the part has a crystal oscillator included, and perhaps an internal PLL, these circuits will
need to be completely bypassed during logic testing. They can be tested later on with an
analog test station as a separate test operation. If one pin on the part is dedicated to test, then
other pins can have dual functions; for example, one of your crystal pins

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 372

could become a reset pin when the test pin is active, and other pins that are analog in nature
can be used as clock and data inputs. The test pin signal may be routed to all of your analog
circuitry to switch over to “test mode.”
You must design your logic circuitry so that during test, a defined pattern of stimulus
signals will produce a defined output, and that all of the circuitry will be tested. This is a very
difficult process to think through, since if your test does not cover the entire circuit, there will
be failures eventually at those untested spots, which will find their way into production. If you
are selling your ASIC as product in the commercial market, you will find this to be very bad
when it comes to customer relationships. If you have a ROM in the design, find a clever way
to use every bit in a checksum or perhaps as part of the signal process, so that every bit is
ultimately tested. The same goes for RAM. Consider checkerboard patterns that will detect
shorted bitlines, and make sure every possible combination of logic signals is represented.
The tester you use will be recommended by the test house, as they will have many
combinations of equipment for this purpose. You will work with their test engineers to arrive
at an economical solution. The tester will most likely run at a much lower frequency than
would be the case in the application product, as at-speed testers are usually quite expensive,
and not necessary. Your test may be limited to a 1 MHz clock, but timing can be determined
by the moment that the ASIC outputs are strobed into the tester for logical equivalence to the
supplied test routine’s expected output. In this way, even though the test is proceeding at a
relatively low rate, signal propagation times can be determined to some maximum limit.
The test routine will be developed through the use of your logic simulator. You will produce
a test pattern that you will apply within the simulation environment, and cause the simulator to
print out the expected results. Working with the test engineer, you will prepare the data in a
format that is usually unique to each test machine. The test house will prepare a DUT board
that will interface their equipment to your project, and keep it on hand, along with the test
pattern data.
There is a “soft” test even when dealing with logic circuitry, and that is the Idd leakage
test, usually performed during a logic sequence—when the stimulus is stopped, a short period
of time elapses, and the leakage current of the part is measured. Leakage through logic devices
often can indicate a flaw in the silicon substrate, or a broken gate that is allowing a transistor
to be partially on, even when the drive to that transistor is commanding it to be off.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 373

The “floating” poly strip indicates a flaw in processing, which is often revealed during
the leakage test. The Idd leakage limit is usually determined by the process and the number of
transistors in the design, which is on the order of a few microamps.
During such Idd leakage testing, ALL of the analog circuits must be off, and any floating
nodes within them must be brought to conditions that will cause a minimum of leakage
current. This means that every circuit I’ve shown earlier must have devices to interrupt
currents, break resistive dividers, and clamp potentially floating nodes to ground or supply, as
required. The development of TEST and TESTN signals that are distributed throughout your
analog circuits is mandatory. Keep this in mind while designing your analog circuits; do not
wait until the project ends to realize that every analog block will need to be completely shut
off during a logic Idd leakage test.

Sensors
Jumping to a completely different subject, one that could (but won’t here) find a chapter of
its own, is that of sensors. Although sophisticated techniques, well beyond those available in a
commercial CMOS process are possible, your project will be very much complicated by
attempting to design in anything other than the standard stuff. Microelectromechanical
(MEMs) devices can be built through the use of specialized etching techniques to produce
miniature, cantilevered, movable structures for sensing acceleration or pressure or directing a
reflected light beam, but these techniques are not normally available in a standard CMOS
process. The few sensors that are available in a standard process, however, may be of interest
to you. I’ve already detailed the temperature sensor, but beyond that we have light,
magnetism, and stress.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 374

Optical sensors
At high noon on a clear day, the solar radiation within the visible wavelength range is about 1
kW/m2. The average energy per photon is about 2 eV, which calculates out to about 3 billion
photons falling on each square micron of exposed surface, during each second. The light in a
moderately well-lit room is about 1% of this level, and yet we can still barely see objects at
night with an illumination on the order of 1/100,000 of full sunlight.
As photons hit the surface of an IC, those that are absorbed within or near the depletion
region of a reversed-biased junction will generate electron-hole pairs that can be sensed
electrically. Longer wavelengths, beyond a micron or so, will pass through the silicon and will
not produce electron-hole pairs; silicon is transparent at these longer wavelengths.
Visible light is composed of wavelengths in the range of 400 to 700 nm, and the penetration
depth into silicon is greater at the longer, red wavelengths, measuring several microns at least,
and on the order of a few tenths of a micron at the blue end of the spectrum. Therefore, simple
diffused source/drain junctions will be more sensitive to blue light, and the deeper N well
junctions will be more sensitive to the red. In general, a 100 μ × 100 μ well junction, biased
to some reverse potential will conduct perhaps 3 μA when exposed to direct sunlight. The dark
current, that is, the leakage current for the N well structure, would be on the order of 100 fA,
or less, at room temperature, allowing the detection of light that is some 10,000,000 times
weaker than direct sunlight. Cooling the sensor will lower leakage yet further, allowing for
greater low light level sensitivity; leakage will double for every 10°C of temperature rise.
The insulation coatings that cover the silicon both the silicon dioxide used as intermetal
insulation and the silicon nitride top coating, are clear, but they do have a high index of
refraction (Si3N4 × 2.0, SiO2 × 1.46), and their thinness causes surface reflection of some of
the light, and selective filtering throughout the visible range, which could account for both a
loss of efficiency (up to 50%) and a variation in color sensitivity (+/−20% across the visible
band). The spectral peak for efficient operation with the N well feature is in the 650 to 900 nm
range, perfect for visible red and infrared LED light.
Photo detectors can be developed to quantify light levels over an extreme range. The log
characteristic of the MOSFET operating in the subthreshold region could be used as shown in
Figure 14.4.
Although scaling will vary with temperature (a problem which could be solved with some
clever circuits), the output should be a good log representation of the light level, over a 6 to 8
orders of magnitude range.
Light detectors built with the well feature, despite the rather low capacitance of the junction,
will be slow unless an amplifier is used to

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 375

Figure 14.4 Log amplifier for accepting a photodiode input.

overcome this limitation, leading to the familiar transimpedance amplifier configuration


shown in Figure 14.5.
The photo sensor of Figure 14.5 is a 1002 μ-square well. The amplifier provides feedback
through R1. The feedback capacitor C1 is critical,

Figure 14.5 Transimpedance amplifier for photodiode use.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 376

Figure 14.6 Transimpedance amplifier output for pulsed light.

as it will determine circuit stability and overall bandwidth. When driven by a strong signal of
about 2 mW from a laser diode modulated at 10 MHz, the resulting output by SPICE
simulation shows good gain that could be easily followed by a fast comparator, as shown in
Figure 14.6.
Smaller diode junctions can lead to greater operation speed, due to reduced junction
capacitance. The noise out of the transimpedance amplifier is the thermal noise of the
feedback resistor R1, measured over the system bandwidth. Additionally, shot noise from
background currents, either diode leakage or ambient light, will add to the thermal noise. The
above example of a 2 mW light source falling on the 100 μ2 detector is extreme; the noise
level, however, is about 500 μV over a 20- MHz bandwidth. The circuit will be capable of
receiving much lower light levels, on the order of microwatts, before noise becomes a
problem.

CMOS cameras
Arrays can be built, the CMOS camera being the best example. The basic structure makes use
of a cell that is arrayed into rows and columns, much like a memory. Each cell (pixel) utilizes
a diode junction that can be charged to a reset potential; as light falls on the diode, its charge
will be depleted. Sampling the diode potential at some time after reset is turned off gives a
measure of light at that pixel.
The active pixel cell of Figure 14.7 must be small, so the diode is often a diffusion feature
as opposed to a well feature. The devices within the cell are all NMOS, so that PMOS wells,
which are bulky, are not required within the cell. M1 resets the potential across the diode
junction as a source follower, which suffers from a high threshold voltage due to the body
effect. M2 is a source follower that is attached to the readout column line through the
activation of ROW_READ and m3. When the

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 377

Figure 14.7 Schematic of a pixel cell for imaging arrays.

COLUMN_OUT lines are terminated with current sources, the columns can be simultaneously
evaluated for pixel potential. The time between ROW_RESET going low and the activation of
ROW_READ establishes a shutter period that can be varied with logic circuits to adapt the
system to varying exposure levels. A layout of a 9 μ2 active pixel cell in a 0.6-μ process is
shown in Figure C14.2, in the color section.
The active element is the N diffusion, and only constitutes about 25% of the cell area. The
COLUMN_OUT signal will vary from about 2 V for a fully charged (reset) cell, and will
decay to GND after sufficient exposure. This is a fairly small range, and the precise reset
potential is critical for good resolution in the dark regions of the image. Since the reset
transistor m1 has a somewhat variable threshold voltage, it is useful to read the cell twice,
once during reset, and a second time after the exposure period. The difference between the
values provides a more accurate reading of the pixel exposure level.

Hall and Strain Sensors


Currents flowing through doped semiconductors are affected by magnetic fields. The velocity
of the carrier flow is defined by the mobility of the semiconductor, which for N-type silicon is
on the order of 600 centimeters per second per volt per centimeter, or 600 cm^2/Vs. Higher
mobility materials allow greater carrier velocities and greater magnetic influence. For CMOS
processes, the N diffusion or well is used, but not P diffusion, as the mobility of P silicon is
about one-third that of N material.
The Hall element is usually a square of well that is contacted on its sides, and with careful
layout becomes a balanced bridge. A crude example is shown in Figure 14.8.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 378

Figure 14.8 Hall sensor construction.

A magnetic field applied perpendicular to the die surface will cause the carrier flow, in this
case electrons, to take a curved path. The result is a difference in potential between the side
contacts. Typical outputs are on the order of 20 mV in an applied field of 1000 G with an
element potential of 5 V. The current consumption is significant. With a typical N well
resistivity of 2K Ω/sq, this single element sensor would draw 2.5 mA at VDD = 5 V.
Unfortunately, silicon is also strain-sensitive, which can cause the offset of the Hall element
to be greater than the magnetic signal that is to be detected. When stretched, silicon will show
a substantial increase in its electrical resistance. Figure 14.9 shows four resistors arranged as a
strain sensor.

Figure 14.9 Strain sensor.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 379

The four resistors in the bridge are simply N diffusions arranged as identical resistors and
biased at the supply potential. If this simple structure is prototyped and mounted in a ceramic
package, bending the package with your fingers can easily create a 50 mV imbalance in the
bridge.
We can easily see the similarities between the Hall sensor and the strain sensor; the Hall
sensor will be very much affected by stresses that remain after packaging. To remove the
strain-induced offset from the Hall sensor, two Hall sensors can be arranged, with one rotated
90° from the other, and their connections placed in parallel.
A single Hall sensor can be used through a switching technique that applies power and
picks off signals from the four terminal device in a rotating fashion; the results of the four
possible measurements, when averaged, remove both layout-induced and strain-induced
offsets, allowing the detection of much lower magnetic fields.

Supply-Boost Circuits
Some applications may require internal supply voltages that are higher than that provided by
VDD, in which case switches and capacitors can be used to produce internal boosted supply
voltages. Classic boost circuits are simple, as shown in Figure 14.10.
These circuits use MOS devices as diodes, and MOSCAPs as output filters. In the case of
the negative boost circuit, Q1 could actually be a PMOS device, but it will behave as a PNP
bipolar. The boost voltages provided by these circuits is of course less than VDD × 2, since
the devices will suffer a forward voltage drop when acting as diodes.
For greater output potentials, two approaches can be taken, both basically differential in
nature. The first is based on the above concept, but expanded to any number of stages, and is
shown in Figure 14.11.
The voltages obtained can be damaging to the thin gate oxide of the devices; it is expected
that this will be taken into consideration.

Figure 14.10 Voltage boost circuits.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 380

Figure 14.11 Multiple stage voltage boosting.

Such differential techniques can be used to create outputs much closer to VDD × 2 with a
single stage, shown in Figure 14.12.
Such stages can be cascaded to achieve higher voltages.
These boost circuits can be used at very high frequencies to minimize the requirement for
large capacitor values. Fifty megahertz is not unreasonable, but the clock signals should be
strong and symmetrical. The currents from such boosted supplies are usually not intended for
high-power applications; these circuits are usually only used in cases where high voltage is
passed at very low current, as in the programming or erase of a memory through a tunneling
mechanism, where the required currents are trivial.
When designing boost circuits, be careful to notice conditions that SPICE may not handle
well, as the overshoot of signals above a well

Figure 14.12 Boost circuit using cross-coupled pairs.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 381

potential may cause current spikes into the substrate. These currents will effectively be losses
that SPICE will model poorly without detailed models of your exact structure.

My Circuits, Your Circuits


Throughout the text, I’ve drawn circuits that must be understood as examples of how designs
can be structured. By no means should they be used directly, because they are only examples.
The beauty of IC design, particularly analog design, is the infinite variety of ideas and circuit
implementations. I’ve tried to show ideas, with the hope that they inspire new thoughts to
seed your own cleverness.
The sandbox designer is perpetually inquisitive, curious, challenging, and daring. You
should have a little voice inside refusing to use any of the circuits herein directly, solidly
slamming down the challenge to do them one better. This will make them your circuits, and
hopefully, better ones. The others you work with may not understand this; they only want to
know that the parts work. You, however, need a reason for doing it in the first place, and
seeing your own ideas and creativity come to be is a powerful motivator. It’s an existential
thing; something that you can find great expression through, and identity with.
I call this sandbox “meaning.” Revel in it.

Parting Thoughts
Part of “odds and ends” is the end itself, and we’ve arrived. The details of any circuit
structures you imagine will surely require further study, through research into published
papers, a better understanding of the features of your tools, but more probably, hard work at
the bench, experimenting with your creations. To make sure you are well prepared to take the
next step into the sandbox, I’ve assembled a list of thoughts, based on my own failures (and
successes):
■ Spend more time thinking about how a part could be structured effectively in a system than
in actually laying out and verifying the device. Systems benefit from extended periods of
thinking “what ifs” and “why nots.”
■ Do not be bashful about making system-wide changes to accommodate your tiny IC.
Others may not understand, and it is your responsibility to enlighten them about how the
system changes are beneficial.
■ Simulate every construction in SPICE thoroughly. Look at what happens when the model
changes from typical to slow, and fast as well. If it is an analog circuit, see how it performs
with supply transients and capacitive loading. This will save many hours at the probe
station later on.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 382

■ Design your logic circuits so that they are extremely robust. If the logic simulation goes
perfectly, extend the setup and hold times on your flip-flop models until the simulation
fails, just to find the spots where close timing could exist.
■ Get a laser pointer (red) and shine it onto your chip to see where light affects it. Explain
why. This is for no other reason than to gain a better understanding for the device, in a
sense, like getting to know a person (for which I don’t suggest the use of lasers).
■ Do not assume the foundry DRC is perfect; carefully examine your GDS tapeout (after any
possible layer generation) to be sure all the layers are indeed there, and that the rules appear
to be followed. Tapeout time should not be a casual affair.
■ When analog parts come back from fab, get a feel for their potential weaknesses, and
certainly their characteristics, by observing their response under odd conditions—supply
voltage, temperature, and so forth. I like to poke the tip of an Xacto knife onto each pin
(inputs and all), to see where the extra loading (or induced noise) affects the part. You
would be surprised if anything changes while poking a power pin, but it often does—a
characteristic that could be a valuable clue to an otherwise mystifying problem.
■ Before you think you’re done, and the prototypes are ready for production, fire up your
cell phone and bring it near the test board. You may find the need to add small resistors in
series with some of the pins at the PCB level. In any case, if interference is found (which is
likely, at least at some distance), do your best to understand why. Complete immunity may
be impossible to achieve, but serious interference can usually be avoided.
■ If you think the project will take six months, double that number. In fact, take every first
estimate of time-to-completion, and double that. A very simple first chip could take one
year to prototype as you learn the tools. The second will be just as long, probably because
you’ve learned enough to make the second design much more complex. A three year
period from product inception to production silicon is not unreasonable, but if it is your
second or third project and the number of blocks is small, it could be wrapped up in six
months (uh, one year). Much of this time is spent waiting though, and if you are of the
sandbox mentality, that waiting time will be well spent—15 hours a day, thinking about
other ways to make the part better.
■ Expect that your first silicon will be a success, only because you have done so much work
in design and simulation to make that the only possible outcome. That said, do not be
discouraged if the first parts don’t work. Find out why and run it again. Probe, think,
simulate,

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 383

examine, understand what went wrong, and make it right. This is not stuff for the faint-
hearted. Boldly go forward with substantiated confidence, or don’t go at all. This
commitment must be firmly established before you start.
■ Finally, good fortune can be had from meeting (and befriending) a process engineer who
loves his work. Much can be learned about process details, clever circuits, and other’s
failures or successes. Do not pass up the opportunity to have in-depth conversations with
this fellow; your may, in fact, be able to help him through some test experiments on your
projects, since tests are expensive for him too; despite the first impression that he has free
reign in the fab, he doesn’t have an unlimited budget, and if he’s the right guy, there’s
no end to the tests he would like to get data from!
I sincerely hope you take this step into the sandbox; that you find it intellectually rewarding
and personally satisfying. Believe me, it has the potential for this and more.
Oh, by the way, did I mention it yet? The sandbox is waiting...

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 384

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 385

Figure C5.1 Sandbox layers and their rendering.

Figure C5.2 Layer interconnections drawn as cells.

Figure C5.3 Test inverter.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 386

Figure C5.4 Test full adder.

Figure C5.5 Test reset flip-flop.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 387

Figure C5.6 Sandbox standard cell library.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 388

Figure C5.6 A Sandbox standard cell library.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 389

Figure C5.6 B Sandbox standard cell library.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 390

Figure C5.6 C Sandbox standard cell library.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 391

Figure C5.6 D Sandbox standard cell library.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 392

Figure C6.1 VDD pad with snapback protection devices.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 393

Figure C6.2 Supply clamp.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 394

Figure C6.3 Basic I/O pad.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 395

Figure C6.4 Schmitt section of I/O pad.

Figure C6.5 Tristate driver portion of I/O pad.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 396

Figure C7.1 SRAM cell.

Figure C7.2 Dual SRAM cell.


Figure C7.3 Portion of SRAM array.

Figure C7.4 SRAM “tie.”

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 397

Figure C7.5 Array with tie.


Figure C7.6 SRAM I/O block cell.

Figure C7.7 SRAM top connection cell.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 398

Figure C7.8 Buffer for address decoder.

Figure C7.9 16-bit ROM cell.

Figure C7.10 Array of ROM cells.

Figure C7.11 ROM wordline driver.


Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).
Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 399

Figure C7.12 Dual differential DRAM cell.

Figure C7.13 DRAM “tie.”


Figure C7.14 DRAM sense amp detail.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 400

Figure C14.1 Simple differential pair layout.

Figure C14.2 CMOS image sensor pixel for arraying.

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 401

Index

Active area, 6, 7
Adder, 93
ADC, 26
Alias frequency, 190, 321
Alpha emitters, 162
Amplifiers, 24, 217
compensation, 256
offset, 271
Anisotropic conductive film, 356
Antenna rules, 39
Arrays, 23, 127
ASIC, 1, 19
Attributes, 60
Autorouter, 23, 31, 60, 64, 76, 77, 92
considerations, 94
Avalanche, 118, 129
Averaging converter, 326

Backend, 9
Backlapping, 6, 352
Backside connection, 353
Bandgap
basic structure, 270
compensation, 278
half-bandgap, 281
regulator, 282
startup circuit, 272
Base layer, 13
Bias generator, 232–234
Biary coding, 184
Binary point, 200
Bipolar transistor, 7, 118, 126–128
dedicated collector, 7, 13
floating collector, 13
lateral, 15, 127
vertical NPN, 13
Biquad filter, 198
Bird’s beak, 231
Bloat, 67
Boat, 53
Body, 15
effect, 15, 224
Bonding, 43, 122
Bonding pads, 38, 121, 123
BPSG, 36, 37
Breakdown voltage, 117, 128
Bus hold, 154, 177
Camera, 27, 376
Capacitance
fringe, 62, 115
gate, 225
poly-poly, 85
source/drain, 225
Capacitors
coupling (stray), 14
external, 14
poly-poly, 9, 13, 14
MIM, 13, 14
MOS, 226
Carry propagation, 93
Cascade devices, 228
Channel, 12
length modulation, 221
resistance, 224
Chopper stabilization, 337
Clamp area, 45
Clock synchronization, 207
CMOS, 5, 15
processes, 19
COB (chip on board), 351
Coefficient accuracy, 197
Common centroid, 368
Common mode control, 334, 335
Comparator, 241, 242
Compensation, 256, 278

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 402

Compensation (cont.)
PLL, 297
Conductance, 14
specific, 35
MOS, 17, 21
Connections
capacitance, 14
contacts 9, 10
examples, 86
substrate, 7, 123
well, 8
via, 10
Contacts, 9, 10, 36
exact size, 38, 84
to ground, 131
plugged, 37
Control store, 199
Convolution, 191
Core limited design, 44
Cost
bumping, 357
device, 22
standard cell, 49
leadframe tooling, 353
masks, 35, 36, 41
NRE, 45
Prototyping, 32–34
test, 361
tools, 36, 42, 64, 65, 72, 77, 78
wafer, 54, 56
Costas loop, 310
Counters, 180–183
Cross-coupled pair, 145
Crossover distortion, 242
Crystal models, 292, 294
Current
density, 21, 74, 249
density, emitter, 268
drive, 24
gate, 298
mirror, 229, 230
source, 227
Current mirror amplifier, 245–247

DAC, 26
Decapsulation, 3
precautions, 4
Decimators, 193
Defect, 11
density, 43
Delay cells, 100
Delta sigma modulator, 213, 327, 328
high order, 339
Density rules, 39
Depletion region, 20, 117, 129
Depletion capacitance, 158
Derived layers, 66, 67
Design rules, 8, 82–84
Device recognition layers, 68
Dicing, 45, 352
saw, 353
Die, 1
Dielectric constants, 114
Differential techniques, 305–307, 334
wiring, 338
inputs, 367
Diffusion, 7
Digital filtering, 190–198
Digital oscillators, 207, 208
Diode, 6, 117
photo, 26
varicap, 119
Dopant, 6, 116, 117
Down bond, 124
Drain, 16
current, 17
DRAM, 155, 165
cell capacitance, 157, 158
sense amp, 159
sense signal, 160
soft errors, 162
leakage, 163
sense clamp, 165
DRC (design rule check), 46, 57
derived layers, 66, 67
rules, 66
Delta sigma modulator (DSM), 213, 214
DUT board (load board), 372

EEPROM, 21–22, 167, 168


Electromigration, 21
Enhancement, 15
Epitaxial layer, 119, 126
Etching, 34, 37
anisotropic, 168
Europractice, 34
EXP function, 204
Exponent generation, 203
EXT (extract), 57
device recognition layers, 68
example, 69
extract definition file, 68–70
parasitics, 71
Extension rule, 84

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 403

Fab, 4, 29
broker, 5
cost, 30
process, 5, 19
Fabless model, 30
Fanout, 90, 92, 101
Field oxide (FOX), 7
FIR filter, 191–193
Flash converters, 321
priority encoder for, 323
Flicker noise, 253
Flip-flop
derivation, 174
set-reset, 177
Floating point, 202
FM, MFM coding, 211
Focussed ion beam, 358, 359
Folded cascade amplifier, 248
FOX (field oxide), 7
Front end, 9
Fun, 1–383

Gate, 9
broken, 373
capacitance, 13, 225
current, 24
length, 17
width, 17
Gate oxide, 8
GDSII, 58, 61, 68
Grid, 11, 62, 84, 87
Ground bounce, 364
Ground conduction, 131

Hall device, 377


HDL (hardware description language), 51
Hierarchy, 58
High voltage devices, 13, 15, 20
Hot carrier effect, 168, 169
Hysteresis, 214, 215

IIR filters, 194–198


Image frequency, 309
Imaged area, 10
Impact ionization, 168
Implant, 6, 9, 69
Inductance, 14, 21, 115, 363
Inductor, 304
Instruction decoder, 199
Insulation, 6
Interpolators, 193
Inverter, 87
DC response, 88
propagation delay, 89
supply current, 90
I/O driver, 134

Junction, 6
Junction capacitance, 158
Junction FET, 317

Ladder DAC, 313


driver, 314
resistor layout, 316
Lambda effect, 221
Lambda rules, 63
Latchup, 20, 126, 154
Layers, 11, 61, 68
example, 85, 86
Layout, 57, 59, 61–65
automation, 64
colors, 68
hand placement, 64
zooming, 63
LCD driver, 357
Leadframe, 1, 21, 352, 354
Lead inductance, 363
Leakage, 14, 35, 163
testing, 372
Liberty file, 77
Lightly doped drain, 168
Linear region, 16, 223
Loading
capacitance, 14
LOG function, 204
Logic devices, 171
Logic levels, 101
Logic synthesis, 50
Look-ahead carry, 188
LVS (layout vs. schematic), 57, 72

Magnetic sensors, 377


Majority carriers, 117
Mantissa generation, 203
Marking, 354
Masks, 9, 10, 11
cost, 11, 35, 41
detail, 11
imaging, 42
Matching
devices, 161
resistors, 317
Measurement, 2
Memories, 23, 41, 139
active resistance in, 153

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 404

Memories (cont.)
circular, 166
control, 150
DRAM, 155
I/O section, 144
ROM, 152
SRAM, 141
wordline driver, 147–150
wordline delay, 153
timing diagram, 147, 162
Metal
layers, 6, 14
resistance, 35
thickness, 21
wide, 38
Metal backing, 141
Microscope, epi, 2–4, 360
working distance, 2
numerical aperture, 2
power, 3
Miller capacitance, 9
Minimum run, 53
Minority carriers, 118, 124, 125
substrate injection, 130
Mixed signal, 1
Mixer, 27, 308
quadrature, 309
MLM (multi-level-mask), 32, 53–55
Mobility, 17
Modulation coding, 210
Molding, 354
MOSCAP, 17, 226, 364
MOSIS, 33
MPW (multi-project-wafer), 32, 53
Multipliers, 189
Multiplier-accumulator, 201

Narrow devices, 231


NDA, 5, 22
NMOS, 7, 9, 15, 17, 119
regions of operation, 218
Noise generator, 209
Noise
considerations, 255
flicker, 253
shot, 252
thermal, 252
Non-overlapping driver, 151
Numbering systems, 184, 186

Offset voltage, 24
Optical alignment target (OAT), 44
Optical sensors, 26, 374
Oscillators
crystal, 292
LC, 287, 288, 305
RC, 288–291
ring, 299

Packages, 351
list, 355, 356
Packaging
production, 46, 48, 351
prototype, 33
Pad limited design, 44
Pad pitch, 131
Parks-McClellan algorithm, 191
Peripheral bus, 86, 121
Phase comparison, 204–206
Phase locked loops, 295
compensation, 297
control, 300
instability, 296
jitter, 298
precautions, 302
predivider, 306
Phase margin, 257
Phase shifter, 310
Photomasks (See Masks)
Photocurrent, 26
Photodiode, 26
Pinchoff, 226
Pixel cell, 377
Place and route, 76
Planarization, 21, 36, 37
PMOS, 7, 9, 15, 17, 119
regions of operation, 218
Polysilicon, 6
resistivity, 12
resistors, 69
undoped, 12
Power distribution, 366
Power-on circuits, 369
Predivider, 306
Primitive device, 58
Priority encoder, 323
Probe pads, 45
Probing circuits, 360
Process control monitors (PCM), 45, 52
Propagation delay, 89, 102
Protection devices, 43, 120, 133
Pulse generators, 178
Pure play, 19, 30
PWM (pulse width modulation), 212

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 405

Quadrature mixer, 309

Ramp converter, 324, 325


Recovery time, 119
Rendering, 62
Resistivity
bulk, 12, 112, 115
areal density, 114
channel, 224
example calculation, 113, 115
materials, 113
metal, 14
sheet, 12, 85, 112, 114
polysilicon, 12, 38
power rails, 91
substrate, 124
Resistors, 14
layout, 279, 316
matching, 14, 279, 317
poly, 69
tolerances, 14, 317
Resonator Q, 13, 73
RFI control, 134, 135
Ring oscillator, 89

Sampling, 321
Saturation limiting, 186, 187
Saturation region, 221
Saturation voltage, 251
Schematic, 57, 59
Scribe lane, 43
Schmitt trigger, 133, 295, 370
Seal ring, 43, 121
Self aligned gate, 9
Semiconductors, 115–120
Serial interfaces, 209
SDF (standard delay format), 92
Shift registers, 179
Shrink, 63, 67
Silicide, 12
block, 12
Simulation, 57
logic, 77
Snapback, 128
SOC, 1
Source, 16
resistance, 219
Spacing rule, 84
SPICE, 57, 71, 72
example, 74
modelling, 17, 18, 73
Standard cells, 22, 81–109
cell height, 90, 93
drive resistance, 104
power distribution, 91
propagation delay, 89, 102
spacing, 93
VIA positioning, 94
width ratio, 89, 101
State variable filter, 198
Statistical variations transistors, 160, 161
Strain sensor, 378
Step and repeat, 10
lenses, 55
Stipple patterns, 62
Stream output, 58
Substrate, 5, 7
diode, 6
grounding, 91, 368
Substrate bounce, 91, 365
Subthreshold, 16, 218, 234
slope, 16, 35, 163, 219
Successive approximation, 318
Supply
clamp, 131, 132
potential, 20, 48
boost circuits, 379, 380
Surround rule, 83–84
Switched capacitor, 330
differential techniques, 334
drive signals for, 330, 331, 349, 350
noise, 346
stray insensitivity, 332
Symbols, 59

Technology file, 62
Temperature coefficient
bandgap reference, 273, 276, 278
threshold, 16
subthreshold, 226
vertical PNP, 268
Temperature sensor, 284
Test, 46, 362, 371
leakage, 372
Thermal conductivity, 25
calculation example, 121
Thermal runaway, 128
Thin oxide (TOX), 8, 20, 35, 119
thick oxide, 12
Threshold voltage, 12, 16, 35
adjust, 12
logic, 103
Transconductance, 219, 235, 236, 250

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.
ODDS AND ENDS Keith Barr 406

Transistor (See NMOS, PMOS, nipolar)


Transitor-resistor logic, 306, 307
Transmission gate, 175
Transmission lines, 14, 15
Tristate buffer, 133, 176, 297
Tools, 36
complexity, 56
(See Schematic, Layout, DRC, LVS, EXT, etc.)
Two’s complement, 185

User defined primitive, 106

Varicap, 119, 305


VCO, 25, 27
Verilog, 60, 75
example, 105, 108
Vertical PNP layout, 269
VHDL, 60, 75
Via, 10
exact size, 38, 86
stacked, 37, 83
standard cell positioning, 94

Wafer, 1
carrier, 53
polarity, 5
thickness, 5, 6
Wafer probe, 52, 360
Watch crystal, 294
Weak inversion, 16
Well, 6, 7
diode, 7
Wideband amplifiers, 264
Width ratio, 89, 101
Window function, 191
Wiring pitch, 94

Yield, 29, 43, 46

Zener diode, 20, 117, 118, 128, 170

Printed from Digital Engineering Library @ McGraw-Hill (www.Digitalengineeringlibrary.com).


Copyright ©2004 The McGraw-Hill Companies. All rights reserved.
Any use is subject to the Terms of Use as given at the website.

You might also like