Metastability due to the connection of signals between different clock domains (CDC (Clock Domain Crossing)) is a problem in today's FPGA designs. Conventional structural verification alone is not effective for verifying CDC signals. This column will be divided into four parts to explain the CDC problem and how to verify it.

 

Part 2: What is Metastable?

In Figure 2-1 there is data straddling between two asynchronous clocks. In such a case, metastability may occur in the subsequent FF (flip-flop).

 

Figure 2-1 Circuit example where metastable occurs

 

As explained in the previous column, FF generally retains data in a loop circuit using two inverters. Closing the input signal before the data completes one round of the loop may cause the output signal to go to the intermediate potential (see Figure 2-2). This is metastable.

 

Figure 2-2 Metastable waveform

 

If you are unlucky enough to reach an intermediate potential between Low(0) and High(1) and you get a good balance, you will be in a metastable state for a long time, like a good surfing wave. When the balance begins to collapse, it settles to Low or High because it is a CMOS circuit.

 

The problem with metastable

Metastability is problematic in many ways. Metastability is often the cause of non-reproducible errors in cases thought to be soft errors or device malfunctions. Here are some of the problems with metastable.

 

1. The metastable state can be significantly longer than the FF delay.
2. Even after the metastable state is over, it is uncertain whether it will settle to low or high.
3. It cannot be confirmed by simulation or actual device verification.
Four. Not reproducible.
Five. Difficult to determine whether CDC measures are effective

 

The reason for the increase in metastable

The reasons for the recent increase in metastability are as follows.

 

1. Improved operating frequency ⇒ Increased metastable probability due to increased number of clocks
2. Increased number of clock domains
3. Increased number of registers

 

Very difficult to verify metastability

Metastability occurs in a delicate balance of manufacturing process, temperature, voltage, etc., and verification by logic and timing simulations is impossible. A circuit that has a probability of occurring once per minute can be verified with an actual device, but a circuit that has a probability of occurring once every several months is difficult to verify even with an actual device.

It is also difficult to judge whether the countermeasures are effective. Therefore, failure analysis by metastable is very time consuming.
For example, the image processing IP created by our group company has only 3 clock domains, but CDC caused frequent problems. It took about a month and a half to find out that the cause of this problem was the CDC.

 

In general, tools are used to extract possible metastable locations from the circuit structure, but hundreds or more are reported as errors or warnings.

It would be a very difficult task to manually analyze each report without any omissions.

  

That's all for now. Next time, I will explain general countermeasures against metastability and their problems.