Professional Documents
Culture Documents
EUROPE
Resear ch Inst it ut e f or Int egr at ed Cir cuit s
Markus Schutti, Markus Pfaff, Richard Hagelauer Freistdter Str. 315, A-4040 Linz, Austria, email: schutti@riic.at
Abstract
A fully synchronous design style with a single global clock is brilliant, but not always possible. Every designer suffers from time to time from the burden caused by an implementation requiring multiple clock domains that are inherently asynchronous. Synthesis is possibly not the main barrier but recurrent hurdles during simulation, verification, timing analysis and (scan) test integration could possibly jeopardize the entire design. This article describes a design style developed for circuits with extensive data transfer between asynchronous clock domains achieving an error- and even warning-free synthesis flow. As result the entire design can be examined properly by the (built-in) static timing analysis of Design Compiler and scan test insertion can be done straight-forward by Test Compiler without any tricks. Of course the proposed design style does not utilize the common 3-stage shift-register synchronizing mechanism! Instead a more refined interface circuitry is presented, improving systems robustness and cutting down implementation efforts. The costs for a data transfer from the faster clock domain to the slower is a simple 2:1-multiplexer, and in the reverse direction a 2:1-multiplexer plus one D-type flip-flop certainly less hardware effort than any other synchronization scheme. The theory of this approach, typical clock/data waveforms, and timing calculation will be shown and discussed. The proposed design style has been successfully employed on a 25k gate design with two different clock domains (using partial shutdown of the clock branches), an embedded 8051 core an several onchip memories.
Introduction
ASICs developed nowadays often require different clocks. SoC (System-on-a-Chip) designs are inherently built of macros and IP cores, delivered by distinct suppliers, using separate clock domains. It must be assumed that these clock domains are asynchronous having no common clock base nor having an even-numbered ratio of their frequencies. The specification of such an ASIC might comprise a microcontroller running at 10 MHz, supported by a special DSP core clocked with 85 MHz, communicating with an USB interface (12 MHz) an some peripheral parallel ports clocked by external components with 1.1 MHz, and not to forget the real time clock operating at 32,768 Hz.
1 of 10
Clock Domain A
Clock Domain B
Clock A Clock B
... clock signals completely asynchronous
In this paper we want to limit the problem to only two different clock domains as shown in Figure 1. However the proposed interface circuitry can be extended to an arbitrary number of clocks. Data transfer takes place in both directions. The term Data Transfer is valid for bitwise or bitparallel data without any underlying protocol.
The Problem
Complex digital circuits implemented in a HDL and synthesized by a synthesis tool like the Design Compiler obey the synchronous-sequential design style. Such designs can be processed automatically; timing paths are examined by the built-in static timing analyzer. But at the junction of clock domains this design style is violated, resulting in a potential cause for malfunction when a signal has to be captured in a foreign clock domain.
Clock Domain A
Clock Domain B
Data
D C Q combinational logic
Sink
Data Transfer (bitwise)
combinational logic D C Q
Clock A Clock B
... clock signals completely asynchronous
Figure 2 depicts such a conflict-causing signal path in detail. The path starts at a flip-flop sensitive to Clock A, passing some combinational logic (consisting of simple non-sequential logic elements) with further signal inputs, and ends at a flip-flop sensitive to Clock B. The circuit of Figure 2 is examined in detail through the waveforms given in Figure 3. Waveform Clock A and Clock B represent the clock signals of flip-flop Data and Sink respectively. The content of flip-flop Data is loaded each and every cycle. This signal has to be latched into the target flip-flop.
2 of 10
Clock A
setup time violation setup time violation
Metastability
Metastability
Figure 3. Setup (and Hold) Time Violation at flip-flop Sink results in Metastability. The shaded areas mark time zones where no data change is allowed.
Since frequency and phase of Clock A and B are not related, the setup (and hold)1 time of the target flip-flop (here: Sink) will be violated from time to time. The Setup Time Condition is violated when data arriving at the D input of the flip-flop is too close to the rising (active) edge of clock (in Figure 3: Clock B). Setup Time Violation causes metastability at the flip-flop. Metastability manifests either in hyperactivity (oscillation) or a invalid logic level at output Q.2 Metastability must be avoided. Metastable elements cause an increase in power consumption, diminish system reliability, and are the reason for malfunction.
Standard Approach
The common way solving this problem is to use a series of D-type flip-flops (typically three), similar to a shiftregister. Using a chain of flip-flops reduces the probability of metastability.
Clock Domain A
FF
Clock Domain B
FF FF
FF FF
FF FF
Clock A Clock B
... clock signals completely asynchronous
This means each data path going from one clock domain to another has to be provided with a 3-stage shift register sensitive to the rising edge of clock (of the target clock domain). As illustrated in Figure 4 this approach is feasible but not optimal: in terms of hardware a 3-stage shift register is expensive and introduces large delay.
Smart Approach
Aim of the proposed circuitry is to cut down the hardware effort of the clock interface. To accomplish this objective the two clocks have to be distinguished into one clock running at a faster and one clock running at a slower frequency. Thus we will use the terms ClockF and ClockS (instead of A and B).
1 The following text deals only with the setup time violation since most of the flip-flops used today have a hold time of zero. However this simplification imposes no restriction because the proposed interface circuitry addresses a larger time zone around the active edge of clock. 2 For better characterization metastability is presented in the timing diagrams as oscillation.
3 of 10
Figure 5 gives an example with ClockF about 8 times faster than ClockS. In a first step we will focus only the data transfer (DataS) from the slow clock domain to the fast clock domain. Hence signal DataS has its source at a flip-flop sensitive to ClockS and has to be captured at a flip-flop sensitive to ClockF, also passing some combinational logic on its path.
ClockF
setup time violation
The shaded time zones mark those areas where data capture must be suspended to avoid timing violation at the target flip-flop. Data capture (at the target flip-flop) is not allowed in the vicinity of the rising edge of ClockS (because a stable data signal can not be expected at this period). The time immediately before the rising edge of ClockS can not be determined: it is not possible to look into the future. Advanced means (e.g. an additional phase-shifted clock signal, a timer, a PLL) are out of scope of this article. The time immediately after the rising edge of ClockS can be detected by synchronizing the signal waveform of ClockS into ClockF. Using the above mentioned 3-stage shift-register for this purpose we get signal SyncSignal as shown in Figure 5 (being signals SyncClk(1..3) the three stages of the shift-register). Consequently the high period of SyncSignal represents the delayed high period of ClockS.
With other words: data capture can be enabled during the delayed high period3 of ClockS. During the delayed high period of ClockS no signal change has to be expected on a data line coming from that clock domain. Thus it is safe to strobe data into a flip-flop triggered by a foreign clock (ClockF) unless that clock performs at least some active clock edges during this period of time. This condition is guaranteed as long as the timing requirement of section Clock Ratio (page 5) is met.
Time Window
From that it is clear to use the high period of SyncSingal as valid time window to capture data into a foreign clock domain. DataS is guaranteed to be stable for the period of this time window. The SyncSignal is generated by the standard 3-stage shift-register structure as shown in Figure 6. Please note that this signal is generated only once globally for the entire ASIC or to be more precise: once for each junction of clock domains.
Clock S Reset
FFSYNC1
D Q Clr
FFSYNC2
D Q Clr
FFSYNC3
D Q Clr
SyncSignal
Clock F
Figure 6. Generation of SyncSignal.
4 of 10
Some extra delay caused by combinational cells on the path from DataS to the target flip-flop DataSinkS is also taken into account. This delay has to be smaller than n times the clock period of ClockF (with n being the number of stages of the shift-register):
Interface circuitry
The gate on the signal path from DataS to DataSinkS should be opened during the time window (when SyncSignal is 1) and otherwise be closed. Clearly this gate can be implemented as simple multiplexer as shown in Figure 7 or as Enable input of the target flip-flop as shown in Figure 8.
Clock F Domain
INTERFACE
DataS
D C Q combinational delay
0 1
DataSinkS
D C Q
Slow Clock
Fast Clock
Figure 7. Interface Circuitry for Data Transfer from the Slow to the Fast Clock Domain using a 2:1 multiplexer.
The circuitry of Figure 7 allows the target flip-flop to be loaded with data from the own clock domain (e.g. to clear of to set the flip-flop), whereas the simpler circuitry of Figure 8 allows only the straightforward data capture.
DataS
D C Q combinational delay D En C Q
DataSinkS
Slow Clock
Fast Clock
Figure 8. Interface Circuitry for Data Transfer from the Slow to the Fast Clock Domain using a flipflop with Enable.
5 of 10
ClockF
The shaded time zones of Figure 9 indicate those areas where the data to be captured at the target flip-flop DataSinkF has to be stable. Obviously this condition is not true for the source signal DataF. If the design engineer can not ensure that DataF is not modified (not loaded with new data) during the shaded time zone (that is postulated here), we need an additional flip-flops to freeze the data signal temporarily. The data signal has to be frozen in the region of the rising (active) edge of ClockS. This time period corresponds to the delayed low period of ClockS, covering the shaded time zones of Figure 9. The delayed low period of ClockS is the same as the low period of the above introduced SyncSignal.
INTERFACE
Data F
D C Q
0 1
Data F locked
D C Q D C Q
DataSink F
Fast Clock
Slow Clock
Figure 10. Interface Circuitry for Data Transfer from the Fast to the Slow Clock Domain.
Therefore DataF is latched into a flip-flop sensitive to ClockF and locked there during the low period of SyncSignal. The output of this flip-flop is used to deliver the data signal safely to the target flip-flop DataSinkF. Exactly on this path the crossing of the both clock domains takes place. However this crossing is safe since the interface circuitry guarantees a stable data signal at the rising edge of ClockS.
Clock Ratio
The presented interface circuitry makes a clear distinction between both transfer directions involved. The interface circuitry works as expected, provided that ClockS runs somewhat slower than ClockF! The following timing calculation is executed separately for both transfer directions. n is a placeholder for the number of stages of the shift-register generating SyncSignal (in the examples above n is 3). A duty cycle of 50% to 50% (for the high and low period) of ClockS and ClockF is assumed.
6 of 10
Fast to Slow
In the worst case n cycles of ClockF after the rising edge of ClockS the SyncSignal rises to high. Then it takes the high period of SyncSignal until DataF is locked in the interface flip-flop. At last also some combinational delay (or wire delay) and the setup time of the target flip-flop has to be considered before reaching the next rising edge of ClockS, where stable data is required to ensure safe data strobing. Thus we can say:
n TFAST +
Assuming the combinational delay together with the setup time is possibly a value in the range of the clock cycle time of ClockF ...
tcombinational delay + t setup TFAST ... we can simplify the formula and get ...
with n = 3
For the data transfer from the slow to the fast clock domain we can argue very similar. In the worst case n cycles of ClockF after the rising edge of ClockS the SyncSignal rises to high. Then it takes the high period of SyncSignal until the multiplexer gate is closed again. At last also some combinational delay (or wire delay) and the setup time of the target flip-flop has to be considered before we reach the next rising edge of ClockS, where we require stable data to ensure safe data strobing. Thus we have exactly the same calculation as before. The given formula is the upper limit of the frequency of the slower clock. The interface circuitry imposes no lower limit. As well there is no absolute boundary for the frequency, just the ratio of both frequencies is the crucial factor.
with n = 3.
7 of 10
The simulation of Figure 11 was done with a clock ratio of 1 : 31.1. The transfer of data packet 568 and 600 (from the fast to the slow clock domain) is visible in the middle of the waveform. In the other direction data packets 115, 116, and 117 are transferred.
The simulation of Figure 12 was done with a clock ratio of 1 : 8.1 near the limit of the operating conditions. This can be reviewed by the transfer of data packet 961 (from the fast to the slow clock domain). The transfer process was executed successfully. But if data 961 packet had been locked only one clock cycle of ClockF later, it would have come to metastability at the target flip-flop: the next rising edge of ClockF is too near to the rising edge of ClockS!
The simulation of Figure 13 was done with a clock ratio of 1 : 4.3 a ratio far outside the operating conditions. The transfer of data packet 583, 600, and 617 (from the fast to the slow clock domain) fails due to not locked data in the interface flip-flop. The content of the interface flip-flop is changed near the rising edge of ClockS, causing metastability in the target flip-flop DataSinkF (illustrated as fast oscillating resp. black waveform in the window). The same happens in the other transfer direction (from the slow to the fast clock domain). The transfer of data packet 210 fails due to SyncSignal being still 1 near the rising edge of ClockF. That means the multiplexer gate on the transfer path is not closed in time, causing metastability in the target flip-flop DataSinkS.
Test Issues
Scan test insertion is almost mandatory for complex digital ASICs. Having multiple clock domains is always risky when doing scan insertion and test pattern generation. Fortunately the presented interface circuitry can be
8 of 10
recognized properly by the Test Compiler. ATPG is also no hurdle. It is possible to arrange separate scan clocks for the different clock domains, and it is also possible to use a common scan clock for all clock domains together. If using different scan clock, the test designer should take care of the test variable test_capture_clock_skew and the command set_scan_configuration clock_mixing.
test_capture_clock_skew = no_skew | small_skew | large_skew
... can be used to avoid creating unreliable capture conditions and thus preventing Test Compiler ATPG to generate invalid vectors. The appropriate setting of the variable can be determined only when the timing of the test clocks is known.
set_scan_configuration clock_mixing no_mix | mix_edges | mix_clocks
... can be used to specify whether insert_scan includes cells from different clock domains in the same scan chain. If set to no_mix, scan chains inserted by insert_scan can contain only cells clocked by the same clock edge. If set to mix_edges, scan chains can contain cells clocked by different edges of the same clock. If set to mix_clocks, scan chains can contain cells clocked by different clocks.
Summary
The presented interface circuitry can be seen as smart substitution to the standard n-stage shift-register synchronization approach. In terms of hardware the proposed interface is a significant simplification. Assuming an ASIC with about 100 transfer signals (e.g. two 32-bit data busses, one 32-bit address lines, and some more control signals) from one clock domain to another would normally cause an hardware effort of 100 3 10 = 3,000 gates (using a 3-stage shift-register and calculating with approx. 10 gates per (scan) flip-flop). Compared to that the proposed interface circuitry requires only 100 3 = 300 gates or 100 (3 + 10) = 1,300 gates for the direction from the slow to the fast or the fast to slow clock domain respectively (calculating with approx. 3 gates per 2:1multiplexer). The overhead for generating signal SyncSignal, a 3-stage shift-register (= 3 10 = 30 gates), is not significant.
Clock A Clock B
... clock signals completely asynchronous
SyncSignal
Figure 14. Data Synchronization by SyncSignal. Less hardware effort compared to 3-stage shiftregister standard approach.
Another benefit of the proposed interface circuitry is the better latency when transferring data to the slow clock domain. The transferred data is available in the slow clock domain not later than one and a half clock cycle of ClockS after new data has been issued in the fast clock domain. The old-fashioned shift-register circuitry would have a latency of n to n+1 cycles of ClockS. Moreover the interface circuitry is easy to implement, can be applied for a wide range of frequency ratios, cuts down the possibility of metastability to a single point (at the shift-register generating SyncSignal) and thus improves systems robustness, and saves power and area.
9 of 10
References
[ABF90] [Arr97] [Bot98] [DT97] [DDDA95] [KB99] [SPKNH98] M. Abramovici, M. A. Breuer, and A. D. Friedman. Digital Systems Testing and Testable Design. Revised printing. New York: IEEE-Press, 1990. Arreguy, Nick. ASIC Multidimensional Design for Test in Integrated System Design, June 1997. Bottoms, Bill. The Third Millenniums Test Dilemma in IEEE Design & Test of Computers vol. 15 (4), October-December 1995: 7-11. D&T Roundtable. Testing Embedded Core in IEEE Design & Test of Computers vol. 14 (2), April-June 1997: 81-89. C. Dislis, J. H. Dick, I. D. Dear, and A. P. Ambler. Test Economics and Design for Testability of Electronic Circuits and Systems. Great Britain: Ellis Horwood, 1995. Keating, Michael und Bricaud, Pierre. Reuse Methodology Manual For System-On-a-Chip Designs. 3. Auflage. Norwell, Massachuesetts: Kluwer, 1999. Schutti, Markus, Markus Pfaff, Robert Kck, Gerhard Nikolaus, und Richard Hagelauer. STELLA - Scan Test Equipment for Low-Volume Low-Cost ASIC Analysis. International Workshop on Design Test and Applications, Faculty of Electrical Engineering and Computing University of Zagreb, Dubrovnik, Croatia, Jun. 1998: 121-124. Test Compiler Reference Manual. Version 1999.05. Mountain View, California: Synopsys, 1999.
[TCRM99]
10 of 10