Monday, January 28, 2013

FIFO

Depth of the Asynchronous FIFO




One of the most interesting architectural decision in the design project is how to calculate the depth of a FIFO. FIFO is an intermediate logic where the data would be buffered or stored . Smaller FIFO depth can cause overflow scenario and cause a data loss.

For worst case scenario, difference in the data rate between write and read should be maximum. Hence, for write operation maximum data rate should be considered and for read operation minimum data rate should be considered for calculating the depth of the FIFO.

Any Asynchronous FIFO has a write frequency and a read frequency. Assume that the write frequency (Fw) is faster than read frequency (Fr).

Scenario 1:

Fw = 1/Tw and Fr = 1/Tr where Tw and Tr are Time periods of write and read respectively.

Now Transmitter (Write side) wants to transmit "W" words of data. But FIFO can take only "N" words of data in Tw time. 
Time taken to transmit "W" words is (Tw/N) * W 

But Receiver can read "P" words in Tr time interval. 
So the Receiver can read ((Tw/N)*W*P)/Tr words in (Tw/N) * W time

Subtract the the data read from FIFO to the data written into the FIFO. 
Here the data written into the FIFO is "W" words 
Data read from the FIFO is ((Tw/N)*W*P)/Tr words. 

FIFO size = W-((Tw/N)*W*P)/Tr 

Where 

= Maximum number of bytes that the transmitter can send 
N = Number of bytes that the transmitter sends per Tw 
Tw = Transmitter's time period 
P = Number of bytes that receiver receives per Tr 
Tr = Receiver's time period
Scenario 2:
Consider the case of a FIFO where the 'Fw' is 100 MHz and 50 words are written into the FIFO in 100 clocks while the 'Fr' is 50 MHz and one word is read out every clock.
In the worst case scenario, the 50 words are written into the FIFO as a burst in 500 ns. In the same time duration, the read side can read only 25 words out of the FIFO. The remaining 25 words are read out of the FIFO in the 50 idle write clocks. So the depth of the FIFO should be at least 28. (Three clock cycles are for synchronizer latency).

if we are not considering the synchronizing latency fifo depth calculations are as follows

One of the most common questions in interviews is how to calculate the depth of a FIFO. Fifo is used as buffering element or queueing element in the system, which is by common sense is required only when you slow at reading than the write operation. So size of the FIFO basically implies the amount of data required to buffer, which depends upon data rate at which data is written and the data rate at which data is read. Statistically, Data rate varies in the system majorily depending upon the load in the system. So to obtain safer FIFO size we need to consider the worst case scenario for the data transfer across the FIFO under consideration.


For worst case scenario, Difference between the data rate between write and read should be maximum. Hence, for write operation maximum data rate should be considered and for read operation minimum data rate should be considered.

So in the question itself, data rate of read operation is specified by the number of idle cycles and for write operation, maximum data rate should be considered with no idle cycle.


So for write operation, we need to know Data rate = Number of data * rate of clock. Writing side is the source and reading side becomes sink, data rate of reading side depends upon the writing side data rate and its own reading rate which is Frd/Idle_cycle_rd.



In order to know the data rate of write operation, we need to know Number of data in a Burst which we have assumed to be B.

So following up with the equation as explained below: Fifo size = Size to be buffered = B - B * Frd / (Fwr* Idle_cycle _rd ).


Here we have not considered the sychnronizing latency if Write and Read clocks are Asynchronous. Greater the Synchronizing latency, higher the FIFO size requirement to buffer more additional data written.

Assume that we have to design a FIFO with following requirements and We want to calculate minumum FIFO depth,
  • A synchronized fifo
  • Writing clock 30MHz - F1
  • Reading clock 40MHz - F2
  • Writing Burst Size - B
  • Case 1 : There is 1 idle clock cycle for reading side - I
  • Case 2 : There is 10 idle clock cycle for reading side - I

FIFO depth calculation = B - B *F2/(F1*I)

If if we have alternate read cycles i.e between two read cycle there is IDLE cycle.
FIFO depth calculation = B - B * F2/(F1*2)

In our present problem FIFO depth = B - B *40/(30*2)

= B(1-2/3)
= B/3
That means if our Burst amount of data is 10 , FIFO
DEPTH = 10/3 = 3.333 = 4 (approximatly

If B = 20 FIFO depth = 20/3 = 6.6 = 7
or 8 (clocks are asynchronous)

If B = 30 FIFO depth = 30/3 = 10
10+1 = 11 (clocks are asynchronous)


Verilog Design



STATIC TIMING ANALYSIS

STATIC TIMING ANALYSIS
ØPulse Width 
ØSetup & Hold times
ØSignal slew
ØClock latency
ØClock Skew
ØInput arrival time
ØOutput required time
ØSlack and Critical path
ØRecovery & Removal times
ØFalse paths
ØMulti-cycle paths
sequencial circuit timings
Maximum Clock Frequency
Maximum allowable clock skew
Global Setup and Hold Times.


Download Link:
https://hotfile.com/dl/191077178/d17fd8f/file-3.ppt.html

Sunday, January 13, 2013

FPGA FAQ


What is FPGA ?


A field-programmable gate array is a semiconductor device containing programmable logic components called "logic blocks", and programmable interconnects. Logic blocks can be programmed to perform the function of basic logic gates such as AND, and XOR, or more complex combinational functions such as decoders or mathematical functions. In most FPGAs, the logic blocks also include memory elements, which may be simple flip-flops or more complete blocks of memory. A hierarchy of programmable interconnects allows logic blocks to be interconnected as needed by the system designer, somewhat like a one-chip programmable breadboard. Logic blocks and interconnects can be programmed by the customer or designer, after the FPGA is manufactured, to implement any logical function—hence the name "field-programmable". FPGAs are usually slower than their application-specific integrated circuit (ASIC) counterparts, cannot handle as complex a design, and draw more power (for any given semiconductor process). But their advantages include a shorter time to market, ability to re-program in the field to fix bugs, and lower non-recurring engineering costs. Vendors can sell cheaper, less flexible versions of their FPGAs which cannot be modified after the design is committed. The designs are developed on regular FPGAs and then migrated into a fixed version that more resembles an ASIC. 

What logic is inferred when there are multiple assign statements targeting the same wire?

It is illegal to specify multiple assign statements to the same wire in a synthesizable code that will become an output port of the module. The synthesis tools give a syntax error that a net is being driven by more than one source.
However, it is legal to drive a three-state wire by multiple assign statements. 

What do conditional assignments get inferred into?

Conditionals in a continuous assignment are specified through the “?:” operator. Conditionals get inferred into a multiplexor. For example, the following is the code for a simple multiplexor

assign wire1 = (sel==1'b1) ? a : b; 

 

What value is inferred when multiple procedural assignments made to the same reg variable in an always block?

When there are multiple nonblocking assignments made to the same reg variable in a sequential always block, then the last assignment is picked up for logic synthesis. For example 

always @ (posedge clk) begin
out <= in1^in2;
out <= in1 &in2;
out <= in1|in2;


 

In the example just shown, it is the OR logic that is the last assignment. Hence, the logic synthesized was indeed the OR gate. Had the last assignment been the “&” operator, it would have synthesized an AND gate. 

1) What is minimum and maximum frequency of dcm in spartan-3 series fpga? 

Spartan series dcm’s have a minimum frequency of 24 MHZ and a maximum of 248 

2)Tell me some of constraints you used and their purpose during your design? 

There are lot of constraints and will vary for tool to tool ,I am listing some of Xilinx constraints 
a) Translate on and Translate off: the Verilog code between Translate on and Translate off is ignored for synthesis. 
b) CLOCK_SIGNAL: is a synthesis constraint. In the case where a clock signal goes through combinatorial logic before being connected to the clock input of a flip-flop, XST cannot identify what input pin or internal net is the real clock signal. This constraint allows you to define the clock net. 
c) XOR_COLLAPSE: is synthesis constraint. It controls whether cascaded XORs should be collapsed into a single XOR. 
For more constraints detailed description refer to constraint guide. 

3) Suppose for a piece of code equivalent gate count is 600 and for another code equivalent gate count is 50,000 will the size of bitmap change?in other words will size of bitmap change it gate count change? 

The size of bitmap is irrespective of resource utilization, it is always the same,for Spartan xc3s5000 it is 1.56MB and will never change. 

4) What are different types of FPGA programming modes?what are you currently using ?how to change from one to another? 

Before powering on the FPGA, configuration data is stored externally in a PROM or some other nonvolatile medium either on or off the board. After applying power, the configuration data is written to the FPGA using any of five different modes: Master Parallel, Slave Parallel, Master Serial, Slave Serial, and Boundary Scan (JTAG). The Master and Slave Parallel modes 
Mode selecting pins can be set to select the mode, refer data sheet for further details. 

5) Tell me some of features of FPGA you are currently using? 

I am taking example of xc3s5000 to answering the question . 

Very low cost, high-performance logic solution for
high-volume, consumer-oriented applications
- Densities as high as 74,880 logic cells
- Up to 784 I/O pins
- 622 Mb/s data transfer rate per I/O
- 18 single-ended signal standards
- 6 differential I/O standards including LVDS, RSDS
- Termination by Digitally Controlled Impedance
- Signal swing ranging from 1.14V to 3.45V
- Double Data Rate (DDR) support
• Logic resources
- Abundant logic cells with shift register capability
- Wide multiplexers
- Fast look-ahead carry logic
- Dedicated 18 x 18 multipliers
- Up to 1,872 Kbits of total block RAM
- Up to 520 Kbits of total distributed RAM
• Digital Clock Manager (up to four DCMs)
- Clock skew elimination
• Eight global clock lines and abundant routing

6) What is gate count of your project? 

Well mine was 3.2 million, I don’t know yours.! 

7) Can you list out some of synthesizable and non synthesizable constructs? 

not synthesizable->>>>
initial 
ignored for synthesis.
delays 
ignored for synthesis.
events 
not supported.
real 
Real data type not supported.
time 
Time data type not supported.
force and release 
Force and release of data types not supported.
fork join 
Use nonblocking assignments to get same effect.
user defined primitives 
Only gate level primitives are supported.

synthesizable constructs->>
assign,for loop,Gate Level Primitives,repeat with constant value...

8)Can you explain what struck at zero means? 

These stuck-at problems will appear in ASIC. Some times, the nodes will permanently tie to 1 or 0 because of some fault. To avoid that, we need to provide testability in RTL. If it is permanently 1 it is called stuck-at-1 If it is permanently 0 it is called stuck-at-0. 

9) Can you draw general structure of fpga? 

 

10) Difference between FPGA and CPLD? 

FPGA:
a)SRAM based technology.
b)Segmented connection between elements.
c)Usually used for complex logic circuits.
d)Must be reprogrammed once the power is off.
e)Costly

CPLD:
a)Flash or EPROM based technology.
b)Continuous connection between elements.
c)Usually used for simpler or moderately complex logic circuits.
d)Need not be reprogrammed once the power is off.
e)Cheaper 

11) What are dcm's?why they are used? 

Digital clock manager (DCM) is a fully digital control system that
uses feedback to maintain clock signal characteristics with a
high degree of precision despite normal variations in operating
temperature and voltage. 
That is clock output of DCM is stable over wide range of temperature and voltage , and also skew associated with DCM is minimal and all phases of input clock can be obtained . The output of DCM coming form global buffer can handle more load. 

12) FPGA design flow? 

 

Also,Please refer to presentation section synthesis ppt on this site. 

13)what is slice,clb,lut?

I am taking example of xc3s500 to answer this question 

The Configurable Logic Blocks (CLBs) constitute the main logic resource for implementing synchronous as well as combinatorial circuits. 
CLB are configurable logic blocks and can be configured to combo,ram or rom depending on coding style
CLB consist of 4 slices and each slice consist of two 4-input LUT (look up table) F-LUT and G-LUT.

14) Can a clb configured as ram? 

YES.

The memory assignment is a clocked behavioral assignment, Reads from the memory are asynchronous, And all the address lines are shared by the read and write statements. 

15)What is purpose of a constraint file what is its extension? 

The UCF file is an ASCII file specifying constraints on the logical design. You create this file and enter your constraints in the file with a text editor. You can also use the Xilinx Constraints Editor to create constraints within a UCF(extention) file. These constraints affect how the logical design is implemented in the target device. You can use the file to override constraints specified during design entry. 

16) What is FPGA you are currently using and some of main reasons for choosing it? 

17) Draw a rough diagram of how clock is routed through out FPGA? 

 

18) How many global buffers are there in your current fpga,what is their significance? 

There are 8 of them in xc3s5000 
An external clock source enters the FPGA using a Global Clock Input Buffer (IBUFG), which directly accesses the global clock network or an Input Buffer (IBUF). Clock signals within the FPGA drive a global clock net using a Global Clock Multiplexer Buffer (BUFGMUX). The global clock net connects directly to the CLKIN input. 

19) What is frequency of operation and equivalent gate count of u r project? 

20)Tell me some of timing constraints you have used? 

21)Why is map-timing option used? 

Timing-driven packing and placement is recommended to improve design performance, timing, and packing for highly utilized designs.

22)What are different types of timing verifications? 

Dynamic timing:
a. The design is simulated in full timing mode.
b. Not all possibilities tested as it is dependent on the input test vectors.
c. Simulations in full timing mode are slow and require a lot of memory.
d. Best method to check asynchronous interfaces or interfaces between different timing domains.
Static timing:
a. The delays over all paths are added up.
b. All possibilities, including false paths, verified without the need for test vectors.
c. Much faster than simulations, hours as opposed to days.
d. Not good with asynchronous interfaces or interfaces between different timing domains.

23) Compare PLL & DLL ? 

PLL:
PLLs have disadvantages that make their use in high-speed designs problematic, particularly when both high performance and high reliability are required. 
The PLL voltage-controlled oscillator (VCO) is the greatest source of problems. Variations in temperature, supply voltage, and manufacturing process affect the stability and operating performance of PLLs.

DLLs, however, are immune to these problems. A DLL in its simplest form inserts a variable delay line between the external clock and the internal clock. The clock tree distributes the clock to all registers and then back to the feedback pin of the DLL.
The control circuit of the DLL adjusts the delays so that the rising edges of the feedback clock align with the input clock. Once the edges of the clocks are aligned, the DLL is locked, and both the input buffer delay and the clock skew are reduced to zero.
Advantages:
· precision
· stability
· power management
· noise sensitivity
· jitter performance.


24) Given two ASICs. one has setup violation and the other has hold violation. how can they be made to work together without modifying the design?

Slow the clock down on the one with setup violations..
And add redundant logic in the path where you have hold violations.

25)Suggest some ways to increase clock frequency?

· Check critical path and optimize it.
· Add more timing constraints (over constrain).
· pipeline the architecture to the max possible extent keeping in mind latency req's. 

26)What is the purpose of DRC? 

DRC is used to check whether the particular schematic and corresponding layout(especially the mask sets involved) cater to a pre-defined rule set depending on the technology used to design. They are parameters set aside by the concerned semiconductor manufacturer with respect to how the masks should be placed , connected , routed keeping in mind that variations in the fab process does not effect normal functionality. It usually denotes the minimum allowable configuration. 

27)What is LVs and why do we do that. What is the difference between LVS and DRC? 

The layout must be drawn according to certain strict design rules. DRC helps in layout of the designs by checking if the layout is abide by those rules.
After the layout is complete we extract the netlist. LVS compares the netlist extracted from the layout with the schematic to ensure that the layout is an identical match to the cell schematic. 

28)What is DFT ? 

DFT means design for testability. 'Design for Test or Testability' - a methodology that ensures a design works properly after manufacturing, which later facilitates the failure analysis and false product/piece detection
Other than the functional logic,you need to add some DFT logic in your design.This will help you in testing the chip for manufacturing defects after it come from fab. Scan,MBIST,LBIST,IDDQ testing etc are all part of this. (this is a hot field and with lots of opportunities) 

29) There are two major FPGA companies: Xilinx and Altera. Xilinx tends to promote its hard processor cores and Altera tends to promote its soft processor cores. What is the difference between a hard processor core and a soft processor core? 

A hard processor core is a pre-designed block that is embedded onto the device. In the Xilinx Virtex II-Pro, some of the logic blocks have been removed, and the space that was used for these logic blocks is used to implement a processor. The Altera Nios, on the other hand, is a design that can be compiled to the normal FPGA logic. 

30)What is the significance of contamination delay in sequential circuit timing? 

Look at the figure below. tcd is the contamination delay. 

 

Contamination delay tells you if you meet the hold time of a flip flop. To understand this better please look at the sequential circuit below. 

 

The contamination delay of the data path in a sequential circuit is critical for the hold time at the flip flop where it is exiting, in this case R2.
mathematically, th(R2) <= tcd(R1) + tcd(CL2)
Contamination delay is also called tmin and Propagation delay is also called tmax in many data sheets. 

31)When are DFT and Formal verification used? 

DFT:
· manufacturing defects like stuck at "0" or "1".
· test for set of rules followed during the initial design stage.

Formal verification:
· Verification of the operation of the design, i.e, to see if the design follows spec.
· gate netlist == RTL ?
· using mathematics and statistical analysis to check for equivalence.

32)What is Synthesis?

Synthesis is the stage in the design flow which is concerned with translating your Verilog code into gates - and that's putting it very simply! First of all, the Verilog must be written in a particular way for the synthesis tool that you are using. Of course, a synthesis tool doesn't actually produce gates - it will output a netlist of the design that you have synthesised that represents the chip which can be fabricated through an ASIC or FPGA vendor. 

33)We need to sample an input or output something at different rates, but I need to vary the rate? What's a clean way to do this?

Many, many problems have this sort of variable rate requirement, yet we are usually constrained with a constant clock frequency. One trick is to implement a digital NCO (Numerically Controlled Oscillator). An NCO is actually very simple and, while it is most naturally understood as hardware, it also can be constructed in software. The NCO, quite simply, is an accumulator where you keep adding a fixed value on every clock (e.g. at a constant clock frequency). When the NCO "wraps", you sample your input or do your action. By adjusting the value added to the accumulator each clock, you finely tune the AVERAGE frequency of that wrap event. Now - you may have realized that the wrapping event may have lots of jitter on it. True, but you may use the wrap to increment yet another counter where each additional Divide-by-2 bit reduces this jitter. The DDS is a related technique. I have two examples showing both an NCOs and a DDS in my File Archive. This is tricky to grasp at first, but tremendously powerful once you have it in your bag of tricks. NCOs also relate to digital PLLs, Timing Recovery, TDMA and other "variable rate" phenomena 

Verilog FAQ


Following is the Verilog code for flip-flop with a positive-edge clock.
 module flop (clk, d, q);
 input  clk, d;
 output q;
 reg    q;
        
 always @(posedge clk)
 begin
    q <= d;
 end
        endmodule
 
        


Following is Verilog code for a flip-flop with a negative-edge clock and asynchronous clear.
 module flop (clk, d, clr, q);
 input  clk, d, clr;
 output q;
 reg    q;
 always @(negedge clk or posedge clr) 
        begin
    if (clr)
       q <= 1’b0;
    else
       q <= d;
 end
        endmodule
        


Following is Verilog code for the flip-flop with a positive-edge clock and synchronous set.
        module flop (clk, d, s, q);
        input  clk, d, s;
        output q;
        reg    q;
        always @(posedge clk)
        begin
           if (s)
              q <= 1’b1;
           else
              q <= d;
        end
        endmodule
        


Following is Verilog code for the flip-flop with a positive-edge clock and clock enable.
 module flop (clk, d, ce, q);
 input  clk, d, ce;
 output q;
 reg    q;
 always @(posedge clk) 
        begin
    if (ce)
              q <= d;
 end
        endmodule
        


Following is Verilog code for a 4-bit register with a positive-edge clock, asynchronous set and clock enable.
 module flop (clk, d, ce, pre, q);
 input        clk, ce, pre;
 input  [3:0] d;
 output [3:0] q;
 reg    [3:0] q;
 always @(posedge clk or posedge pre) 
        begin
    if (pre)
       q <= 4’b1111;
    else if (ce)
       q <= d;
        end
        endmodule
        


Following is the Verilog code for a latch with a positive gate.
 module latch (g, d, q);
        input  g, d;
        output q;
 reg    q; 
 always @(g or d) 
        begin
           if (g)
              q <= d;
        end
        endmodule
        


Following is the Verilog code for a latch with a positive gate and an asynchronous clear.
        module latch (g, d, clr, q); 
        input  g, d, clr;
        output q;
        reg    q;
 always @(g or d or clr) 
        begin
           if (clr)
              q <= 1’b0;
           else if (g)
              q <= d;
        end
        endmodule
        


Following is Verilog code for a 4-bit latch with an inverted gate and an asynchronous preset.
        module latch (g, d, pre, q);
        input        g, pre;
        input  [3:0] d;
        output [3:0] q;
        reg    [3:0] q;
        always @(g or d or pre)
        begin
           if (pre)
              q <= 4’b1111;
           else if (~g)
              q <= d;
        end
        endmodule
        


Following is Verilog code for a tristate element using a combinatorial process and always block.
        module three_st (t, i, o);
        input  t, i;
        output o;
        reg    o;
        always @(t or i)
        begin
           if (~t)
              o = i;
           else
              o = 1’bZ;
        end
        endmodule
        


Following is the Verilog code for a tristate element using a concurrent assignment.
 module three_st (t, i, o);
 input  t, i;
 output o;
    assign o = (~t) ? i: 1’bZ;
        endmodule
        


Following is the Verilog code for a 4-bit unsigned up counter with asynchronous clear.
        module counter (clk, clr, q);
        input        clk, clr;
        output [3:0] q;
        reg    [3:0] tmp;
        always @(posedge clk or posedge clr)
        begin
           if (clr)
              tmp <= 4’b0000;
           else
              tmp <= tmp + 1’b1;
        end
           assign q = tmp;
        endmodule
        


Following is the Verilog code for a 4-bit unsigned down counter with synchronous set.
        module counter (clk, s, q);
        input        clk, s;
        output [3:0] q;
        reg    [3:0] tmp;
        always @(posedge clk)
        begin
           if (s)
              tmp <= 4’b1111;
           else
              tmp <= tmp - 1’b1;
        end
           assign q = tmp;
        endmodule
        


Following is the Verilog code for a 4-bit unsigned up counter with an asynchronous load from the primary input.
        module counter (clk, load, d, q);
        input        clk, load;
        input  [3:0] d;
        output [3:0] q;
        reg    [3:0] tmp;
        always @(posedge clk or posedge load)
        begin
           if (load)
              tmp <= d;
           else
              tmp <= tmp + 1’b1;
        end
           assign q = tmp;
        endmodule 
        


Following is the Verilog code for a 4-bit unsigned up counter with a synchronous load with a constant.
 module counter (clk, sload, q);
 input        clk, sload;
 output [3:0] q;
 reg    [3:0] tmp;
 always @(posedge clk)
 begin
    if (sload) 
              tmp <= 4’b1010;
    else 
       tmp <= tmp + 1’b1;
 end
    assign q = tmp;
        endmodule
        


Following is the Verilog code for a 4-bit unsigned up counter with an asynchronous clear and a clock enable.
 module counter (clk, clr, ce, q);
 input        clk, clr, ce;
 output [3:0] q;
 reg    [3:0] tmp;
 always @(posedge clk or posedge clr)
 begin
    if (clr)
       tmp <= 4’b0000;
    else if (ce)
       tmp <= tmp + 1’b1;
 end
    assign q = tmp;
        endmodule
        


Following is the Verilog code for a 4-bit unsigned up/down counter with an asynchronous clear.
 module counter (clk, clr, up_down, q);
 input        clk, clr, up_down;
 output [3:0] q;
 reg    [3:0] tmp;
 always @(posedge clk or posedge clr)
 begin
    if (clr)
       tmp <= 4’b0000;
    else if (up_down) 
       tmp <= tmp + 1’b1;
    else
       tmp <= tmp - 1’b1;
 end
    assign q = tmp;
        endmodule
        


Following is the Verilog code for a 4-bit signed up counter with an asynchronous reset.
        module counter (clk, clr, q);
        input               clk, clr;
        output signed [3:0] q;
        reg    signed [3:0] tmp;
        always @ (posedge clk or posedge clr)
        begin
           if (clr)
              tmp <= 4’b0000;
           else
              tmp <= tmp + 1’b1;
        end
           assign q = tmp;
        endmodule
        


Following is the Verilog code for a 4-bit signed up counter with an asynchronous reset and a modulo maximum.
        module counter (clk, clr, q);
        parameter MAX_SQRT = 4, MAX = (MAX_SQRT*MAX_SQRT);
        input                 clk, clr;
        output [MAX_SQRT-1:0] q;
        reg    [MAX_SQRT-1:0] cnt;
        always @ (posedge clk or posedge clr)
        begin
           if (clr)
              cnt <= 0;
           else
              cnt <= (cnt + 1) %MAX;
        end
           assign q = cnt;
        endmodule
        


Following is the Verilog code for a 4-bit unsigned up accumulator with an asynchronous clear.
        module accum (clk, clr, d, q);
        input        clk, clr;
        input  [3:0] d;
        output [3:0] q;
        reg    [3:0] tmp;
        always @(posedge clk or posedge clr)
        begin
           if (clr)
              tmp <= 4’b0000;
           else
              tmp <= tmp + d;
        end
           assign q = tmp;
        endmodule
        


Following is the Verilog code for an 8-bit shift-left register with a positive-edge clock, serial in and serial out.
 module shift (clk, si, so);
 input        clk,si;
 output       so;
 reg    [7:0] tmp;
 always @(posedge clk)
 begin
    tmp    <= tmp << 1;
    tmp[0] <= si;
 end
    assign so = tmp[7];
        endmodule
        


Following is the Verilog code for an 8-bit shift-left register with a negative-edge clock, a clock enable, a serial in and a serial out.
 module shift (clk, ce, si, so);
 input        clk, si, ce;
 output       so;
 reg    [7:0] tmp;
 always @(negedge clk)
 begin
    if (ce) begin
       tmp    <= tmp << 1;
       tmp[0] <= si;
    end
 end
    assign so = tmp[7];
        endmodule
        


Following is the Verilog code for an 8-bit shift-left register with a positive-edge clock, asynchronous clear, serial in and serial out.
 module shift (clk, clr, si, so);
 input        clk, si, clr;
 output       so;
 reg    [7:0] tmp;
 always @(posedge clk or posedge clr)
 begin
    if (clr)
       tmp <= 8’b00000000;
    else
       tmp <= {tmp[6:0], si};
 end
    assign so = tmp[7];
        endmodule
        


Following is the Verilog code for an 8-bit shift-left register with a positive-edge clock, a synchronous set, a serial in and a serial out.
 module shift (clk, s, si, so);
 input        clk, si, s;
 output       so;
 reg    [7:0] tmp;
 always @(posedge clk)
 begin
    if (s)
       tmp <= 8’b11111111;
    else
       tmp <= {tmp[6:0], si};
 end
    assign so = tmp[7];
        endmodule
        


Following is the Verilog code for an 8-bit shift-left register with a positive-edge clock, a serial in and a parallel out.
        module shift (clk, si, po);
        input        clk, si;
        output [7:0] po;
        reg    [7:0] tmp;
        always @(posedge clk)
        begin
           tmp <= {tmp[6:0], si};
        end
           assign po = tmp;
        endmodule
        


Following is the Verilog code for an 8-bit shift-left register with a positive-edge clock, an asynchronous parallel load, a serial in and a serial out.
 module shift (clk, load, si, d, so);
 input        clk, si, load;
 input  [7:0] d;
 output       so;
 reg    [7:0] tmp;
 always @(posedge clk or posedge load)
 begin
    if (load) 
              tmp <= d;
    else
       tmp <= {tmp[6:0], si};
 end
    assign so = tmp[7];
        endmodule
        


Following is the Verilog code for an 8-bit shift-left register with a positive-edge clock, a synchronous parallel load, a serial in and a serial out.
        module shift (clk, sload, si, d, so);
        input        clk, si, sload;
        input  [7:0] d;
        output       so;
        reg    [7:0] tmp;
        always @(posedge clk)
        begin
           if (sload)
              tmp <= d;
           else
              tmp <= {tmp[6:0], si};
        end
           assign so = tmp[7];
        endmodule
        


Following is the Verilog code for an 8-bit shift-left/shift-right register with a positive-edge clock, a serial in and a serial out.
 module shift (clk, si, left_right, po);
 input        clk, si, left_right;
 output       po;
 reg    [7:0] tmp;
 always @(posedge clk)
 begin
    if (left_right == 1’b0)
       tmp <= {tmp[6:0], si};
    else
       tmp <= {si, tmp[7:1]};
 end
    assign po = tmp;
        endmodule
        


Following is the Verilog code for a 4-to-1 1-bit MUX using an If statement.
 module mux (a, b, c, d, s, o);
 input        a,b,c,d;
 input  [1:0] s;
 output       o;
 reg          o;
 always @(a or b or c or d or s)
 begin
    if (s == 2’b00)
       o = a;
    else if (s == 2’b01)
       o = b;
    else if (s == 2’b10)
       o = c;
    else
       o = d;
 end
        endmodule
        


Following is the Verilog Code for a 4-to-1 1-bit MUX using a Case statement.
 module mux (a, b, c, d, s, o);
 input        a, b, c, d;
 input  [1:0] s;
 output       o;
 reg          o;
 always @(a or b or c or d or s)
 begin
    case (s)
       2’b00   : o = a;
       2’b01   : o = b;
       2’b10   : o = c;
       default : o = d;
    endcase
 end
        endmodule
        


Following is the Verilog code for a 3-to-1 1-bit MUX with a 1-bit latch.
        module mux (a, b, c, d, s, o);
        input        a, b, c, d;
        input  [1:0] s;
        output       o;
        reg          o;
        always @(a or b or c or d or s)
        begin
           if (s == 2’b00)
              o = a;
           else if (s == 2’b01)
              o = b;
           else if (s == 2’b10)
              o = c;
        end
        endmodule
        


Following is the Verilog code for a 1-of-8 decoder.
        module mux (sel, res);
        input  [2:0] sel;
        output [7:0] res;
        reg    [7:0] res;
        always @(sel or res)
        begin
           case (sel)
              3’b000  : res = 8’b00000001;
              3’b001  : res = 8’b00000010;
              3’b010  : res = 8’b00000100;
              3’b011  : res = 8’b00001000;
              3’b100  : res = 8’b00010000;
              3’b101  : res = 8’b00100000;
              3’b110  : res = 8’b01000000;
              default : res = 8’b10000000;
           endcase
        end
        endmodule
        


Following Verilog code leads to the inference of a 1-of-8 decoder.
 module mux (sel, res);
 input  [2:0] sel;
 output [7:0] res;
 reg    [7:0] res;
 always @(sel or res) begin
    case (sel)
       3’b000  : res = 8’b00000001;
       3’b001  : res = 8’b00000010;
       3’b010  : res = 8’b00000100;
       3’b011  : res = 8’b00001000;
       3’b100  : res = 8’b00010000;
       3’b101  : res = 8’b00100000;
       // 110 and 111 selector values are unused
       default : res = 8’bxxxxxxxx;
    endcase
 end
        endmodule
        


Following is the Verilog code for a 3-bit 1-of-9 Priority Encoder.
 module priority (sel, code);
 input  [7:0] sel;
 output [2:0] code;
 reg    [2:0] code;
 always @(sel)
 begin
    if (sel[0]) 
       code = 3’b000;
    else if (sel[1]) 
       code = 3’b001;
    else if (sel[2]) 
       code = 3’b010;
    else if (sel[3]) 
       code = 3’b011;
    else if (sel[4]) 
       code = 3’b100;
    else if (sel[5]) 
       code = 3’b101;
    else if (sel[6]) 
       code = 3’b110;
    else if (sel[7]) 
       code = 3’b111;
    else 
       code = 3’bxxx;
 end
        endmodule
        


Following is the Verilog code for a logical shifter.
 module lshift (di, sel, so);
        input  [7:0] di;
 input  [1:0] sel;
 output [7:0] so;
 reg    [7:0] so;
 always @(di or sel)
 begin
    case (sel)
       2’b00   : so = di;
       2’b01   : so = di << 1;
       2’b10   : so = di << 2;
       default : so = di << 3;
    endcase
 end
        endmodule
        


Following is the Verilog code for an unsigned 8-bit adder with carry in.
 module adder(a, b, ci, sum);
 input  [7:0] a;
 input  [7:0] b;
 input        ci;
 output [7:0] sum;
        
    assign sum = a + b + ci;

        endmodule
        
Following is the Verilog code for an unsigned 8-bit adder with carry out.
 module adder(a, b, sum, co);
 input  [7:0] a;
 input  [7:0] b;
 output [7:0] sum;
 output       co;
 wire   [8:0] tmp;

    assign tmp = a + b;
    assign sum = tmp [7:0];
    assign co  = tmp [8];

        endmodule
        


Following is the Verilog code for an unsigned 8-bit adder with carry in and carry out.
        module adder(a, b, ci, sum, co);
        input        ci;
        input  [7:0] a;
        input  [7:0] b;
        output [7:0] sum;
        output       co;
        wire   [8:0] tmp;

           assign tmp = a + b + ci;
           assign sum = tmp [7:0];
           assign co  = tmp [8];

        endmodule
        


Following is the Verilog code for an unsigned 8-bit adder/subtractor.
 module addsub(a, b, oper, res);
 input        oper;
 input  [7:0] a;
 input  [7:0] b;
 output [7:0] res;
 reg    [7:0] res;
 always @(a or b or oper)
 begin
    if (oper == 1’b0)
       res = a + b;
    else
       res = a - b;
        end
        endmodule
        


Following is the Verilog code for an unsigned 8-bit greater or equal comparator.
 module compar(a, b, cmp);
 input  [7:0] a;
 input  [7:0] b;
 output       cmp;

    assign cmp = (a >= b) ?  1’b1 : 1’b0;

        endmodule
        


Following is the Verilog code for an unsigned 8x4-bit multiplier.
        module compar(a, b, res);
        input  [7:0]  a;
        input  [3:0]  b;
        output [11:0] res;

           assign res = a * b;

        endmodule
        


Following Verilog template shows the multiplication operation placed outside the always block and the pipeline stages represented as single registers.
        module mult(clk, a, b, mult);
        input         clk;
        input  [17:0] a;
        input  [17:0] b;
        output [35:0] mult;
        reg    [35:0] mult;
        reg    [17:0] a_in, b_in;
        wire   [35:0] mult_res;
        reg    [35:0] pipe_1, pipe_2, pipe_3;

           assign mult_res = a_in * b_in;

        always @(posedge clk)
        begin
    a_in   <= a; 
           b_in   <= b;
           pipe_1 <= mult_res;
           pipe_2 <= pipe_1;
           pipe_3 <= pipe_2;
           mult   <= pipe_3;
        end
        endmodule
        


Following Verilog template shows the multiplication operation placed inside the always block and the pipeline stages are represented as single registers.
        module mult(clk, a, b, mult);
        input         clk;
        input  [17:0] a;
        input  [17:0] b;
        output [35:0] mult;
        reg    [35:0] mult;
        reg    [17:0] a_in, b_in;
        reg    [35:0] mult_res;
        reg    [35:0] pipe_2, pipe_3;
        always @(posedge clk)
        begin
    a_in     <= a; 
           b_in     <= b;
           mult_res <= a_in * b_in;
           pipe_2   <= mult_res;
           pipe_3   <= pipe_2;
           mult     <= pipe_3;
        end
        endmodule
        


Following Verilog template shows the multiplication operation placed outside the always block and the pipeline stages represented as single registers.
 module mult(clk, a, b, mult);
 input         clk;
 input  [17:0] a;
 input  [17:0] b;
 output [35:0] mult;
 reg    [35:0] mult;
 reg    [17:0] a_in, b_in;
 wire   [35:0] mult_res;
 reg    [35:0] pipe_1, pipe_2, pipe_3;

    assign mult_res = a_in * b_in;

 always @(posedge clk)
 begin
    a_in   <= a; 
    b_in   <= b;
    pipe_1 <= mult_res;
    pipe_2 <= pipe_1;
    pipe_3 <= pipe_2;
    mult   <= pipe_3;
 end
        endmodule
        


Following Verilog template shows the multiplication operation placed inside the always block and the pipeline stages are represented as single registers.
 module mult(clk, a, b, mult);
 input         clk;
 input  [17:0] a;
 input  [17:0] b;
 output [35:0] mult;
 reg    [35:0] mult;
 reg    [17:0] a_in, b_in;
 reg    [35:0] mult_res;
 reg    [35:0] pipe_2, pipe_3;
 always @(posedge clk)
 begin
    a_in     <= a;
    b_in     <= b;
    mult_res <= a_in * b_in;
    pipe_2   <= mult_res;
    pipe_3   <= pipe_2;
    mult     <= pipe_3;
 end
        endmodule
        


Following Verilog template shows the multiplication operation placed outside the always block and the pipeline stages represented as shift registers.
 module mult3(clk, a, b, mult);
 input         clk;
 input  [17:0] a;
 input  [17:0] b;
 output [35:0] mult;
 reg    [35:0] mult;
 reg    [17:0] a_in, b_in;
 wire   [35:0] mult_res;
 reg    [35:0] pipe_regs [3:0];

    assign mult_res = a_in * b_in;

 always @(posedge clk)
 begin
    a_in <= a; 
    b_in <= b;
    {pipe_regs[3],pipe_regs[2],pipe_regs[1],pipe_regs[0]} <= 
           {mult, pipe_regs[3],pipe_regs[2],pipe_regs[1]};
 end
        endmodule
        


Following templates to implement Multiplier Adder with 2 Register Levels on Multiplier Inputs in Verilog.
 module mvl_multaddsub1(clk, a, b, c, res);
 input         clk;
 input  [07:0] a;
 input  [07:0] b;
 input  [07:0] c;
 output [15:0] res;
 reg    [07:0] a_reg1, a_reg2, b_reg1, b_reg2;
 wire   [15:0] multaddsub;
 always @(posedge clk)
 begin
    a_reg1 <= a; 
    a_reg2 <= a_reg1;
    b_reg1 <= b; 
    b_reg2 <= b_reg1;
 end
    assign multaddsub = a_reg2 * b_reg2 + c;
    assign res = multaddsub;
        endmodule
        


Following is the Verilog code for resource sharing.
 module addsub(a, b, c, oper, res);
 input        oper;
 input  [7:0] a;
 input  [7:0] b;
 input  [7:0] c;
 output [7:0] res;
 reg    [7:0] res;
 always @(a or b or c or oper)
 begin
    if (oper == 1’b0)
       res = a + b;
    else
       res = a - c;
 end
        endmodule
        


Following templates show a single-port RAM in read-first mode.
 module raminfr (clk, en, we, addr, di, do);
 input        clk;
 input        we;
 input        en;
 input  [4:0] addr;
 input  [3:0] di;
 output [3:0] do;
 reg    [3:0] RAM [31:0];
 reg    [3:0] do;
 always @(posedge clk)
 begin
    if (en) begin
       if (we)
   RAM[addr] <= di;

              do <= RAM[addr];
    end
 end
        endmodule
        


Following templates show a single-port RAM in write-first mode.
 module raminfr (clk, we, en, addr, di, do);
 input        clk;
 input        we;
 input        en;
 input  [4:0] addr;
 input  [3:0] di;
 output [3:0] do;
 reg    [3:0] RAM [31:0];
 reg    [4:0] read_addr;
 always @(posedge clk)
 begin
    if (en) begin
       if (we) 
   RAM[addr] <= di;
              read_addr <= addr;
    end
 end
    assign do = RAM[read_addr];
        endmodule
        


Following templates show a single-port RAM in no-change mode.
 module raminfr (clk, we, en, addr, di, do);
 input        clk;
 input        we;
 input        en;
 input  [4:0] addr;
 input  [3:0] di;
 output [3:0] do; 
 reg    [3:0] RAM [31:0];
 reg    [3:0] do;
 always @(posedge clk)
 begin
    if (en) begin 
       if (we)
   RAM[addr] <= di;
       else
   do <= RAM[addr];
    end
 end
        endmodule
        


Following is the Verilog code for a single-port RAM with asynchronous read.
        module raminfr (clk, we, a, di, do);
        input        clk;
        input        we;
        input  [4:0] a;
        input  [3:0] di;
        output [3:0] do;
        reg    [3:0] ram [31:0];
        always @(posedge clk)
        begin
    if (we) 
              ram[a] <= di;
        end
           assign do = ram[a];
        endmodule
        


Following is the Verilog code for a single-port RAM with "false" synchronous read.
 module raminfr (clk, we, a, di, do);
 input        clk;
 input        we;
 input  [4:0] a;
 input  [3:0] di;
 output [3:0] do;
 reg    [3:0] ram [31:0];
 reg    [3:0] do;
 always @(posedge clk) 
 begin
    if (we)
       ram[a] <= di;
    do <= ram[a];
 end
        endmodule
        


Following is the Verilog code for a single-port RAM with synchronous read (read through).
 module raminfr (clk, we, a, di, do);
 input        clk;
 input        we;
 input  [4:0] a;
 input  [3:0] di;
 output [3:0] do;
 reg    [3:0] ram [31:0];
 reg    [4:0] read_a;
 always @(posedge clk) 
 begin
    if (we)
       ram[a] <= di;
    read_a <= a;
 end
    assign do = ram[read_a];
        endmodule
        


Following is the Verilog code for a single-port block RAM with enable.
        module raminfr (clk, en, we, a, di, do);
        input        clk;
        input        en;
        input        we;
        input  [4:0] a;
        input  [3:0] di;
        output [3:0] do;
        reg    [3:0] ram [31:0];
        reg    [4:0] read_a;
        always @(posedge clk)
        begin
           if (en) begin
              if (we)
                 ram[a] <= di;
              read_a <= a;
           end
        end
           assign do = ram[read_a];
        endmodule
        


Following is the Verilog code for a dual-port RAM with asynchronous read.
 module raminfr (clk, we, a, dpra, di, spo, dpo);
 input        clk;
 input        we;
 input  [4:0] a;
 input  [4:0] dpra;
 input  [3:0] di;
 output [3:0] spo;
 output [3:0] dpo;
 reg    [3:0] ram [31:0];
 always @(posedge clk) 
 begin
    if (we)
       ram[a] <= di;
 end
    assign spo = ram[a];
    assign dpo = ram[dpra];
        endmodule
        


Following is the Verilog code for a dual-port RAM with false synchronous read.
        module raminfr (clk, we, a, dpra, di, spo, dpo);
        input        clk;
        input        we;
        input  [4:0] a;
        input  [4:0] dpra;
        input  [3:0] di;
        output [3:0] spo;
        output [3:0] dpo;
        reg    [3:0] ram [31:0];
        reg    [3:0] spo;
        reg    [3:0] dpo;
 always @(posedge clk) 
        begin
           if (we)
              ram[a] <= di;

           spo = ram[a];
           dpo = ram[dpra];
        end
        endmodule
        


Following is the Verilog code for a dual-port RAM with synchronous read (read through).
 module raminfr (clk, we, a, dpra, di, spo, dpo);
 input        clk;
 input        we;
 input  [4:0] a;
 input  [4:0] dpra;
 input  [3:0] di;
 output [3:0] spo;
 output [3:0] dpo;
 reg    [3:0] ram [31:0];
 reg    [4:0] read_a;
 reg    [4:0] read_dpra;
 always @(posedge clk) 
 begin
    if (we)
       ram[a] <= di;
    read_a <= a;
    read_dpra <= dpra;
 end
    assign spo = ram[read_a];
    assign dpo = ram[read_dpra];
        endmodule
        


Following is the Verilog code for a dual-port RAM with enable on each port.
 module raminfr (clk, ena, enb, wea, addra, addrb, dia, doa, dob);
 input        clk, ena, enb, wea;
 input  [4:0] addra, addrb;
 input  [3:0] dia;
 output [3:0] doa, dob;
 reg    [3:0] ram [31:0];
 reg    [4:0] read_addra, read_addrb;
 always @(posedge clk) 
 begin
    if (ena) begin
       if (wea) begin
   ram[addra] <= dia;
       end
    end
 end

 always @(posedge clk) 
 begin
    if (enb) begin
       read_addrb <= addrb;
    end
 end
    assign doa = ram[read_addra];
    assign dob = ram[read_addrb];
        endmodule
        


Following is Verilog code for a ROM with registered output.
        module rominfr (clk, en, addr, data);
        input       clk;
        input       en;
        input [4:0] addr;
        output reg [3:0] data;
 always @(posedge clk) 
        begin
           if (en)
              case(addr)
                 4’b0000: data <= 4’b0010;
                 4’b0001: data <= 4’b0010;
                 4’b0010: data <= 4’b1110;
                 4’b0011: data <= 4’b0010;
                 4’b0100: data <= 4’b0100;
                 4’b0101: data <= 4’b1010;
                 4’b0110: data <= 4’b1100;
                 4’b0111: data <= 4’b0000;
                 4’b1000: data <= 4’b1010;
                 4’b1001: data <= 4’b0010;
                 4’b1010: data <= 4’b1110;
                 4’b1011: data <= 4’b0010;
                 4’b1100: data <= 4’b0100;
                 4’b1101: data <= 4’b1010;
                 4’b1110: data <= 4’b1100;
                 4’b1111: data <= 4’b0000;
                 default: data <= 4’bXXXX;
              endcase
        end
        endmodule
        


Following is Verilog code for a ROM with registered address.
 module rominfr (clk, en, addr, data);
 input       clk;
 input       en;
 input [4:0] addr;
 output reg [3:0] data;
 reg   [4:0] raddr;
 always @(posedge clk)
 begin
    if (en)
       raddr <= addr;
 end

 always @(raddr) 
 begin
    if (en)
       case(raddr)
   4’b0000: data = 4’b0010;
   4’b0001: data = 4’b0010;
   4’b0010: data = 4’b1110;
   4’b0011: data = 4’b0010;
   4’b0100: data = 4’b0100;
   4’b0101: data = 4’b1010;
   4’b0110: data = 4’b1100;
   4’b0111: data = 4’b0000;
   4’b1000: data = 4’b1010;
   4’b1001: data = 4’b0010;
   4’b1010: data = 4’b1110;
   4’b1011: data = 4’b0010;
   4’b1100: data = 4’b0100;
   4’b1101: data = 4’b1010;
   4’b1110: data = 4’b1100;
   4’b1111: data = 4’b0000;
   default: data = 4’bXXXX;
       endcase
 end
        endmodule
        


Following is the Verilog code for an FSM with a single process.
 module fsm (clk, reset, x1, outp);
 input        clk, reset, x1;
 output       outp;
 reg          outp;
 reg    [1:0] state;
 parameter s1 = 2’b00; parameter s2 = 2’b01;
 parameter s3 = 2’b10; parameter s4 = 2’b11;
 always @(posedge clk or posedge reset)
 begin
    if (reset) begin
       state <= s1; outp <= 1’b1;
    end 
    else begin
       case (state)
   s1: begin 
   if (x1 == 1’b1) begin
      state <= s2;
                           outp  <= 1’b1;
   end
   else begin
      state <= s3;
                           outp  <= 1’b1;
   end
       end
   s2: begin
   state <= s4; 
                        outp  <= 1’b0;
       end
   s3: begin
   state <= s4; 
                        outp  <= 1’b0;
       end
   s4: begin
   state <= s1; 
                        outp  <= 1’b1;
       end
       endcase
    end
 end
        endmodule
        


Following is the Verilog code for an FSM with two processes.
 module fsm (clk, reset, x1, outp);
 input        clk, reset, x1;
 output       outp;
 reg          outp;
 reg    [1:0] state;
 parameter s1 = 2’b00; parameter s2 = 2’b01;
 parameter s3 = 2’b10; parameter s4 = 2’b11;
 always @(posedge clk or posedge reset)
 begin
    if (reset)
       state <= s1;
    else begin
       case (state)
   s1: if (x1 == 1’b1)
   state <= s2;
       else
   state <= s3;
   s2: state <= s4;
   s3: state <= s4;
   s4: state <= s1;
       endcase
    end
 end
 always @(state) begin
    case (state)
       s1: outp = 1’b1;
       s2: outp = 1’b1;
       s3: outp = 1’b0;
       s4: outp = 1’b0;
    endcase
 end
        endmodule
        


Following is the Verilog code for an FSM with three processes.
 module fsm (clk, reset, x1, outp);
 input        clk, reset, x1;
 output       outp;
 reg          outp;
 reg    [1:0] state;
 reg    [1:0] next_state;
 parameter s1 = 2’b00; parameter s2 = 2’b01;
 parameter s3 = 2’b10; parameter s4 = 2’b11;
 always @(posedge clk or posedge reset)
 begin
    if (reset)
       state <= s1;
    else 
       state <= next_state;
 end

 always @(state or x1)
 begin
    case (state)
       s1: if (x1 == 1’b1)
       next_state = s2;
    else
       next_state = s3;
       s2: next_state = s4;
       s3: next_state = s4;
       s4: next_state = s1;
    endcase
        end
        

Digital FAQ

1) Explain about setup time and hold time, what will happen if there is setup time and hold tine violation, how to overcome this? 
Set up time is the amount of time before the clock edge that the input signal needs to be stable to guarantee it is accepted properly on the clock edge. 
Hold time is the amount of time after the clock edge that same input signal has to be held before changing it to make sure it is sensed properly at the clock edge.
Whenever there are setup and hold time violations in any flip-flop, it enters a state where its output is unpredictable: this state is known as metastable state (quasi stable state); at the end of metastable state, the flip-flop settles down to either '1' or '0'. This whole process is known as metastability

2) What is skew, what are problems associated with it and how to minimize it? 
In circuit design, clock skew is a phenomenon in synchronous circuits in which the clock signal (sent from the clock circuit) arrives at different components at different times. 
This is typically due to two causes. The first is a material flaw, which causes a signal to travel faster or slower than expected. The second is distance: if the signal has to travel the entire length of a circuit, it will likely (depending on the circuit's size) arrive at different parts of the circuit at different times. Clock skew can cause harm in two ways. Suppose that a logic path travels through combinational logic from a source flip-flop to a destination flip-flop. If the destination flip-flop receives the clock tick later than the source flip-flop, and if the logic path delay is short enough, then the data signal might arrive at the destination flip-flop before the clock tick, destroying there the previous data that should have been clocked through. This is called a hold violation because the previous data is not held long enough at the destination flip-flop to be properly clocked through. If the destination flip-flop receives the clock tick earlier than the source flip-flop, then the data signal has that much less time to reach the destination flip-flop before the next clock tick. If it fails to do so, a setup violation occurs, so-called because the new data was not set up and stable before the next clock tick arrived. A hold violation is more serious than a setup violation because it cannot be fixed by increasing the clock period.
Clock skew, if done right, can also benefit a circuit. It can be intentionally introduced to decrease the clock period at which the circuit will operate correctly, and/or to increase the setup or hold safety margins. The optimal set of clock delays is determined by a linear program, in which a setup and a hold constraint appears for each logic path. In this linear program, zero clock skew is merely a feasible point.
Clock skew can be minimized by proper routing of clock signal (clock distribution tree) or putting variable delay buffer so that all clock inputs arrive at the same time 

3) What is slack? 

'Slack' is the amount of time you have that is measured from when an event 'actually happens' and when it 'must happen’.. The term 'actually happens' can also be taken as being a predicted time for when the event will 'actually happen'.
When something 'must happen' can also be called a 'deadline' so another definition of slack would be the time from when something 'actually happens' (call this Tact) until the deadline (call this Tdead).
Slack = Tdead - Tact. 
Negative slack implies that the 'actually happen' time is later than the 'deadline' time...in other words it's too late and a timing violation....you have a timing problem that needs some attention. 

4) What is glitch? What causes it (explain with waveform)? How to overcome it? 

 

The following figure shows a synchronous alternative to the gated clock using a data path. The flip-flop is clocked at every clock cycle and the data path is controlled by an enable. When the enable is Low, the multiplexer feeds the output of the register back on itself. When the enable is High, new data is fed to the flip-flop and the register changes its state


 

5) Given only two xor gates one must function as buffer and another as inverter? 

Tie one of xor gates input to 1 it will act as inverter. 
Tie one of xor gates input to 0 it will act as buffer. 

6) What is difference between latch and flipflop? 

The main difference between latch and FF is that latches are level sensitive while FF are edge sensitive. They both require the use of clock signal and are used in sequential logic. For a latch, the output tracks the input when the clock signal is high, so as long as the clock is logic 1, the output can change if the input also changes. FF on the other hand, will store the input only when there is a rising/falling edge of the clock. 

7) Build a 4:1 mux using only 2:1 mux? 

 

Difference between heap and stack? 

The Stack is more or less responsible for keeping track of what's executing in our code (or what's been "called"). The Heap is more or less responsible for keeping track of our objects (our data, well... most of it - we'll get to that later.).
Think of the Stack as a series of boxes stacked one on top of the next. We keep track of what's going on in our application by stacking another box on top every time we call a method (called a Frame). We can only use what's in the top box on the stack. When we're done with the top box (the method is done executing) we throw it away and proceed to use the stuff in the previous box on the top of the stack. The Heap is similar except that its purpose is to hold information (not keep track of execution most of the time) so anything in our Heap can be accessed at any time. With the Heap, there are no constraints as to what can be accessed like in the stack. The Heap is like the heap of clean laundry on our bed that we have not taken the time to put away yet - we can grab what we need quickly. The Stack is like the stack of shoe boxes in the closet where we have to take off the top one to get to the one underneath it. 

9) Difference between mealy and moore state machine? 

A) Mealy and Moore models are the basic models of state machines. A state machine which uses only Entry Actions, so that its output depends on the state, is called a Moore model. A state machine which uses only Input Actions, so that the output depends on the state and also on inputs, is called a Mealy model. The models selected will influence a design but there are no general indications as to which model is better. Choice of a model depends on the application, execution means (for instance, hardware systems are usually best realized as Moore models) and personal preferences of a designer or programmer 

B) Mealy machine has outputs that depend on the state and input (thus, the FSM has the output written on edges) 
Moore machine has outputs that depend on state only (thus, the FSM has the output written in the state itself.

Adv and Disadv
In Mealy as the output variable is a function both input and state, changes of state of the state variables will be delayed with respect to changes of signal level in the input variables, there are possibilities of glitches appearing in the output variables. Moore overcomes glitches as output dependent on only states and not the input signal level.
All of the concepts can be applied to Moore-model state machines because any Moore state machine can be implemented as a Mealy state machine, although the converse is not true. 
Moore machine: the outputs are properties of states themselves... which means that you get the output after the machine reaches a particular state, or to get some output your machine has to be taken to a state which provides you the output.The outputs are held until you go to some other state Mealy machine:
Mealy machines give you outputs instantly, that is immediately upon receiving input, but the output is not held after that clock cycle. 

10) Difference between onehot and binary encoding? 

Common classifications used to describe the state encoding of an FSM are Binary (or highly encoded) and One hot.
A binary-encoded FSM design only requires as many flip-flops as are needed to uniquely encode the number of states in the state machine. The actual number of flip-flops required is equal to the ceiling of the log-base-2 of the number of states in the FSM.
A onehot FSM design requires a flip-flop for each state in the design and only one flip-flop (the flip-flop representing the current or "hot" state) is set at a time in a one hot FSM design. For a state machine with 9- 16 states, a binary FSM only requires 4 flip-flops while a onehot FSM requires a flip-flop for each state in the design
FPGA vendors frequently recommend using a onehot state encoding style because flip-flops are plentiful in an FPGA and the combinational logic required to implement a onehot FSM design is typically smaller than most binary encoding styles. Since FPGA performance is typically related to the combinational logic size of the FPGA design, onehot FSMs typically run faster than a binary encoded FSM with larger combinational logic blocks 

11) What are different ways to synchronize between two clock domains? 

Clock Domain Crossing. . .



The following section explains clock domain interfacing 

One of the biggest challenges of system-on-chip (SOC) designs is that different blocks operate on independent clocks. Integrating these blocks via the processor bus, memory ports, peripheral busses, and other interfaces can be troublesome because unpredictable behavior can result when the asynchronous interfaces are not properly synchronized 

A very common and robust method for synchronizing multiple data signals is a handshake technique as shown in diagram below This is popular because the handshake technique can easily manage changes in clock frequencies, while minimizing latency at the crossing. However, handshake logic is significantly more complex than standard synchronization structures. 

 

FSM1(Transmitter) asserts the req (request) signal, asking the receiver to accept the data on the data bus. FSM2(Receiver) generally a slow module asserts the ack (acknowledge) signal, signifying that it has accepted the data. 

it has loop holes: when system Receiver samples the systems Transmitter req line and Transmitter samples system Receiver ack line, they have done it with respect to their internal clock, so there will be setup and hold time violation. To avoid this we go for double or triple stage synchronizers, which increase the MTBF and thus are immune to metastability to a good extent. The figure below shows how this is done. 

 


Blocking vs Non-Blocking. . .



self triggering blocks -

module osc2 (clk);
output clk;
reg clk;
initial #10 clk = 0;
always @(clk) #10 clk <= ~clk;
endmodule

After the first @(clk) trigger, the RHS expression of the nonblocking assignment is evaluated and the LHS value scheduled into the nonblocking assign updates event queue.
Before the nonblocking assign updates event queue is "activated," the @(clk) trigger statement is encountered and the always block again becomes sensitive to changes on the clk signal. When the nonblocking LHS value is updated later in the same time step, the @(clk) is again triggered.

module osc1 (clk);
output clk;
reg clk;
initial #10 clk = 0;
always @(clk) #10 clk = ~clk;
endmodule

Blocking assignments evaluate their RHS expression and update their LHS value without interruption. The blocking assignment must complete before the @(clk) edge-trigger event can be scheduled. By the time the trigger event has been scheduled, the blocking clk assignment has completed; therefore, there is no trigger event from within the always block to trigger the @(clk) trigger.

Bad modeling: - (using blocking for seq. logic)

always @(posedge clk) begin
q1 = d;
q2 = q1;
q3 = q2;
end

Race Condition
always @(posedge clk) q1=d;
always @(posedge clk) q2=q1;
always @(posedge clk) q3=q2;

always @(posedge clk) q2=q1;
always @(posedge clk) q3=q2;
always @(posedge clk) q1=d;

always @(posedge clk) begin
q3 = q2;
q2 = q1;
q1 = d;
end
Bad style but still works

 

Good modeling: - 

always @(posedge clk) begin
q1 <= d;
q2 <= q1;
q3 <= q2;
end

always @(posedge clk) begin
q3 <= q2;
q2 <= q1;
q1 <= d;
end

No matter of sequence for Nonblocking 
always @(posedge clk) q1<=d;
always @(posedge clk) q2<=q1;
always @(posedge clk) q3<=q2;

always @(posedge clk) q2<=q1;
always @(posedge clk) q3<=q2;
always @(posedge clk) q1<=d;

Good Combinational logic :- (Blocking)

always @(a or b or c or d) begin
tmp1 = a & b;
tmp2 = c & d;
y = tmp1 | tmp2;
end
Bad Combinational logic :- (Nonblocking)

always @(a or b or c or d) begin will simulate incorrectly…
tmp1 <= a & b; need tmp1, tmp2 insensitivity
tmp2 <= c & d;
y <= tmp1 | tmp2;
end

Mixed design: -

Use Nonblocking assignment.
In case on multiple non-blocking assignments last one will win.

Verilog FSM



 




12) How to calculate maximum operating frequency? 

13) How to find out longest path?

You can find answer to this in timing.ppt of presentations section on this site 

14) Draw the state diagram to output a "1" for one cycle if the sequence "0110" shows up (the leading 0s cannot be used in more than one sequence)?


15) How to achieve 180 degree exact phase shift?


Never tell using inverter 
a) dcm’s an inbuilt resource in most of fpga can be configured to get 180 degree phase shift.
b) Bufgds that is differential signaling buffers which are also inbuilt resource of most of FPGA can be used. 

16) What is significance of ras and cas in SDRAM? 

SDRAM receives its address command in two address words. 
It uses a multiplex scheme to save input pins. The first address word is latched into the DRAM chip with the row address strobe (RAS).
Following the RAS command is the column address strobe (CAS) for latching the second address word. 
Shortly after the RAS and CAS strobes, the stored data is valid for reading. 

17) Tell some of applications of buffer? 

a)They are used to introduce small delays
b)They are used to eliminate cross talk caused due to inter electrode capacitance due to close routing.
c)They are used to support high fanout,eg:bufg


18) Implement an AND gate using mux? 

This is the basic question that many interviewers ask. for and gate, give one input as select line,incase if u r giving b as select line, connect one input to logic '0' and other input to a. 

19) What will happen if contents of register are shifter left, right? 

It is well known that in left shift all bits will be shifted left and LSB will be appended with 0 and in right shift all bits will be shifted right and MSB will be appended with 0 this is a straightforward answer 

What is expected is in a left shift value gets Multiplied by 2 eg:consider 0000_1110=14 a left shift will make it 0001_110=28, it the same fashion right shift will Divide the value by 2. 

20)Given the following FIFO and rules, how deep does the FIFO need to be to prevent underflow or overflow? 

RULES:
1) frequency(clk_A) = frequency(clk_B) / 4
2) period(en_B) = period(clk_A) * 100
3) duty_cycle(en_B) = 25%


Assume clk_B = 100MHz (10ns)
From (1), clk_A = 25MHz (40ns)
From (2), period(en_B) = 40ns * 400 = 4000ns, but we only output for
1000ns,due to (3), so 3000ns of the enable we are doing no output work. Therefore, FIFO size = 3000ns/40ns = 75 entries.