You are on page 1of 53

Clock Domain Crossing:

Issues and Solutions


Udit Kumar, DCG
Sakshi Gupta,TRnD
Bhanu Prakash, DCG

March 12 , 2012

Clock
Domain1
Clock
Domain2
2
Agenda
What is Clock Domain Crossing (CDC)
The Problem ? The trend
Leads to Chip Failure .no software fix..
How CDC Issues look like
And recommendations around them
Special case CDC Issues
Design is CDC OK but Silicon has issues ??
Different flows available (with Atrenta Spyglass)
when to use what..
Appendix
3
List of Keywords
Keyword Meaning
CDC Clock domain Crossing
Structural Check Looking into structure of design.
Functional Check Apply formal method to prove design works as
expected. (Kind of Simulation!)
Gray code Encoding where state change has only1 bit change
Spyglass An EDA tool to perform CDC analysis, from Atrenta
ip_block Spyglass term, to check CDC at IP boundary.
sgdc Spyglass design constraints
Abstract model A CDC model of an IP which can be used at SoC
Clock in a Digital System
Digital system consists of combinational and
sequential logic.
Clock triggers flops leading to state changes (viz.
State Machines).
Clock is the fastest signal in a synchronous
system.
F
F
1
F
F
2
Clk
Main Properties Of Clock
A clock has
Period of repetition
(linked with its
frequency)

Phase depicts rise &
fall transitions

A flop can be triggered
thru any of clock
phases.
Tp
Phase
+ve
-ve
Domain of a clock
Logic which is triggered
by clock (or derived
clocks)
Also known as
Synchronous system.

Conversely, domains
with clocks of variable
phase and frequency
are different clock
domains.
Also known as
Asynchronous.
Domain 100Mhz
Domain 27Mhz
Why have multiple clock domains?..
SoC have multiple interfaces
with very different clock
frequencies


Why have multiple clock domains?

To reduce Clocktree
balancing
complexities.

Consider clocks to
various blocks as
asynchronous.
Synchronous
clocks but
considered
Async.
The CDC Path
When clocks are a-synchronous,
the signals that interface between are called clock
domain (CDC) paths.
Within each domain, data transfer is protected by
Setup & Hold checks of flops.
No check exists on CDC path





Setup/
Hold
Setup/
Hold
CDC is related to Asynchronous clock domains
The CDC Paradigm

Heterogeneous applications having Dozens of Clocks

Traditional STA checks or functional simulation do not verify
CDC problems!

Responsibility for validating CDC issues is shifting from
verification engineers to logic designers

11
Why is it important?
Multiple cases of Chip failure due to this effect across the
world.
Chip Failures in past, failure Cost.
Wrong clock connections
IP data convergence issue.
ctrl_req
soft_reset_active_stbus
ctrl_req_int gdp_proc_clk
st_ck
st_ck
Clock Domain crossings
Dos & Don't
Meta-stability
A flip-flop needs input to be stable before and after the clock edge. (Setup & Hold Time) .

In CDC crossing, there will be setup & hold violations.

Then, the output of flip-flop may take much longer time to reach a valid logic level. This is
called metastability.
Very
close
14
MTBF - Mean Time Between Failure
Reciprocal of failure rate
Should be as high as possible
Failure means signal goes metastable after first stage synchronizer and
continues to be metastable one cycle later when it is sampled in the
second stage synchronizer flop.
Synchronizing
clock frequency
data changing
frequency
Duration of
metastable output
(1/Tau)
Techno dependent
15
Example on how to calculate MTBF
MTBF (Mean Time Between Failures): Average
time a system will run between failures.

A system has 4000 components with a failure rate of
0.02% per 1000 hours. Calculate MTBF.
No of failure per hour=
(Failure rate) * (Number of components)
(0.02 / 100) * (1 / 1000) * 4000 = 8 * 10
-4
per hours

MTBF = 1 / (8 * 10
-4
) = 1250 hours


16
Clock crossing: Minimum Solution
Clk
A
AW
AS
"A synchronizer is a device that samples an
asynchronous signal and outputs a signal that is
synchronized to a local or sample clock [1].

Asynchronous signal
D Q D Q
Clk
Da
As
AW
FF3
Synchronized signal
D Q
A
C1
FF2 FF1
synchronizer
Synchronizing cell should come from special cell library.
17
Special Cell Properties
Total number of synchronizer instance in the system
contribute to system MTBF.
Sync. Cell has less Tau
18
Having a Synchronizer is not enough,
one needs to follow more rules !
19
No combinational logic at crossing point
Unconstrained path has delay imbalance, leading to
loss of data & glitches.
Make sure that CDC signal is directly coming from a flop.


20
Re-convergence after Sync
Chip Killer
!!
Actual
Can also
Lead to
Bad FSM
triggering

Compute the controls & then do one transfer across domain

21


Crossing fast to slow domain
Data Hold problem
(Signal crosses from a fast clock domain to a slow clock domain)

To avoid CDC issues, Hold the data till a time-out (using Pulse extenders).


Q1 D1
Clk1
clk2
==N
Counter
Clk1
Hold data (for minimum 3 RX clock edge) till the
transfer takes place (Traffic police).
22
Handshake based data transfer
Handshake, where control path is synchronized & data
path is follower.






Disadvantage
Delay in synchronizing control signals (in both directions)
affects the thru-put.
clk_A
domain
clk_B
domain
Logic in control path ensures that transfer on Data
bus is coherent.

23
Bulk data transfer using FIFO
When throughput is important, FIFO based
synchronizers fit well.

clk_A
Domain
clk_B
Domain
Control bits transfer should be gray coded.
Flow control using FIFOs overflow/underflow.


24
Types of CDC Checks
Structural Checks
looks for presence of corrective logic (viz. synchronizer) at
crossing.
Functional Checks
Formally verify that protection is error free.
Functional CDC
(assertion based)
1 2 3
Structural
CDC
1. Missing or incorrect synchronizer.
2. In-correctly implemented CDC protocols.
3. Re-convergence issue.

25
Why Functional CDC Checks ?
Validates by looking at hardware that
FIFO is being written when overflow?
Hand shaking scheme not functionality correct ?
Gray-code behavior breaks?







26
Not so obvious Silicon Issues


CDC OK on RTL
Can we assume Silicon will
work ?


27
Multi-bit data bus CDC issue, due to physical
implementation
Logic in Serial (Control) Path is used to ensure that
transfer on multi-bit (Data) is coherent.
Loop Delay of Control Path should be Less than Stable time of Data

28
Physical View
1ns
Capture Period : 5 ns
100 ns
Huge
delay
imbalance
Data path is severely skewed due to No constraints & also due
to Physical constraints

Not Checked
In flow
29
Constraints for such paths
1ns
Capture Period : 5 ns
100 ns
Need to constrain data bus even though transfer is on
an Asynchronous interface.

Skew Limit for the bus
Related Issues
Shoot-thru with-in a clock domain
31
Clock assignment leading to shoot-
thru
Any assignment in the
clock path.
data_in
D
clk clk
data_int dout_out
clock
clock2
VHDL: Due
to Delta
Delay
Shoot Thru
may occur if
capture clock
is delayed
Shoot Thru @IP
In Verilog Non-
Blocking can
trigger new
events
32
How to solve the shoot-thru problem

Silicon behavior is no longer the same as RTL simulation
Root cause is change in scheduling of events in a RTL simulator


How to avoid ?
Force event scheduling at such crossings by explicit
delay
X_delayed <= x after 1 ns; (VHDL)
X_delayed <= # 1 x; (Verilog)
NOTE: Force scheduling should not be done on clock path

Checker for this is on the way
33

Glitch
across CDC paths


34
Agenda
Glitch in CDC
Impact of Glitch in Asynchronous circuits
Recommendations to avoid glitch in CDC
Different approaches to run CDC checks on SOC
Solution to CDC problems - SpyGlass tool
How Spyglass tool works?
Conclusion

35
Glitch in CDC
Glitch is an un-wanted pulse, mostly created
when multiple signals converge thru. Combo
logic
STA is not applicable to asynchronous interfaces
Expected o/p : Constant Low
Actual o/p : Low-High-Low
36
Recommendations to avoid Glitches
Ck2
Ck1
Dont use combinational logic on data path
Ck2
Ck1
Ck1
37
Recommendations to avoid Glitches
Dont use combinational logic on control path

Ck2
Ck1
Q
qualifier
Ck2
Ck1
Q
qualifier
38
Recommendations to avoid Glitches
Always use Glitch Free Mux in RTL
Tools can change Mux to AND-OR


39

Now, Checking the CDC
Types of Checks
When to use what?


40
Different Approaches to FULL Chip CDC
Full Chip FLAT (SINGLE RUN)
CDC verification will be done on the full SOC
Reports Crossings inside the various blocks

Top_Block
CDC Verification done for crossings
within & outside the blocks
41
Different Approaches to FULL Chip CDC
Hierarchical approach - ip_block based
approach
Blocks CDC clean
Crossings inside the various blocks in the SOC will not
be reported



42
Different Approaches to FULL Chip CDC
Hierarchical approach Abstraction based
approach
Blocks CDC clean
Crossings inside the various blocks in the SOC will
not be reported

43
Conclusion on CDC checks
Need to have a new check exclusively for
CDC.
CDC checks should be done both on RTL &
Netlist (at least for Glitch!)
Structural checks should be done before
Functional CDC checks to avoid noise
SoC level checks can be done at Interface of
IPs.
44
List of Keywords
Keyword Meaning
CDC Clock domain Crossing
Structural Check Looking into structure of design.
Functional Check Apply formal method to prove design works as
expected. (Kind of Simulation!)
Gray code Encoding where state change has only1 bit change
Spyglass An EDA tool to perform CDC analysis, from Atrenta
ip_block Spyglass term, to check CDC at IP boundary.
sgdc Spyglass design constraints
Abstract model A CDC model of an IP which can be used at SoC
45
References
[1] William J. Dally and John W. Poulton, Digital Systems Engineering,
Cambridge University Press, 1998
[2] Mark Litterick, Pragmatic Simulation-Based Verification of Clock Domain
Crossing Signals and Jitter Using SystemVerilog Assertions, DVCon 2006
www.verilab.com/files/sva_cdc_paper_dvcon2006.pdf
[3] Clifford E. Cummings, Clock Domain Crossing (CDC) Design &
VerificationTechniques Using SystemVerilog, SNUG-2008, Boston.
[4] Atrenta Spyglass, http://www.atrenta.com/solutions/spyglass-
family/spyglass.htm

Thanks
Thanks to DCG IP & SoC team
to discuss various issue and for
partnership in different projects
CDC analysis.
APPENDIX
48
Why ?
Non-blocking assignment can trigger additional events in the
same time step.
Shoot Thru can occur in verilog?
ref : SNUG 2002
49
Verilog issue example
always @(posedge clock or negedge rst_n)
begin
if (rst_n == 1'b0) begin
clock_half <= 1'b0;
end else begin
clock_half <= ~clock_half;
end
end

always @(posedge clock or negedge rst_n)
begin
if (rst_n == 1'b0)
begin
enb <= 1'b0;
cnt <= 2'b0;
end
else
begin
cnt <= cnt +1;
if (cnt == 2'b10)
enb <= ~enb;
else
enb <= enb;
end
End

module dut(rst_n,clock,clock_half,enb,sig_b);

input rst_n, clock, clock_half, enb;
output sig_b;
reg sig_a;
reg sig_b;

always@(posedge clock_half or negedge rst_n)
begin
if(rst_n == 1'b0) begin
sig_b <= 1'b0;
end else begin
sig_b <= sig_a;
end
End
always@(negedge rst_n or posedge clock)
begin
if(rst_n == 1'b0) begin
sig_a <= 1'b0;
end else begin
sig_a <= enb ;
end
End
endmodule


Non-blocking
assignment completion
1
2
3
4
50
Problem (c) Transfer of Multiple signals
across CDC boundary
Multiple signal
At the CDC
boundary
51
Solution (c) - Use control signals to
control transfer
Control signal
At the CDC
boundary
Forward
path
Reverse
path
52
Reset Synchronization
Asynchronous resets must be de-asserted synchronously
Asynchronous de-assertion of resets may put the design into an
unintended state
Reset Synchronizer
reset
network
clk_B
A C B
F3 F2
reset
o_reset
clk_B
reset
o_reset
asynchronous reset de-assertion in reset
reset de-assertion is
synchronized and happens in
o_reset after two clock edges
reset asserted asynchronously in
both reset and o_reset together
Little's Formula
At steady state, the same number of processes are arriving in a
queue as are leaving the queue.

In this case, Little's formula applies:

n = W

where:
n = the average queue length
W = the average wait time
= the average arrival rate of processes

If one knows two of the above variables, one can compute the third.

You might also like