This workshop aims to teach
- Verilog coding guidelines
- Synthesizable Verilog codes
- Synthesis
- Optimizations
- Synthesis-Simulation mismatch
- If statements
- For loop and For generate.
We will achieve the above using iverilog for simulations and yosys for synthesis. In addition, we will be using sky130 standard cell libraries. It will tell our synthesis tool which component to use for our design. This library contains details like power consumption, delays, area, etc.
Transistor-level design connects transistors into circuits to build gates or other components. Combinational or sequential circuits whose building blocks are primarily logic gates are thus called logic-level design. Designing circuits whose building blocks are registers and other datapath components consists of transferring data from registers through other datapath components like adders and back to registers. Such design is thus called register-transfer level design or RTL design.
- iverilog or Icarus Verilog is a simulation and synthesis tool. It should be noted that we will be using it as a simulation tool only and do the synthesis using yosys. We will be viewing the simulation results using a waveform viewer known as gtkwave. It uses a .vcd file to produce the waveform. VCD stands for value change dump. It is a dump file that the gtkwave uses for simulation.
- A testbench provides these 'changes' in values. A testbench is a Verilog program that checks the functionality of our design by giving various possible inputs to the design.
- Lets take an example of a counter which counts from 0 to 2:
- To run these files using iverilog, use the following command:
iverilog good_counter.v tb_good_counter.v
. - If there were no errors, this will create a 'a.out' file in your current working directory. 'a.out' is an output file. Running this file will create .vcd file which will be used for simulation. To run the 'a.out' file use the following command:
./a.out
orvvp a.out
. - Note that the default .vcd file name will be tb_.vcd.
- Once the .vcd file has been generated, we can finally view the output using gtkwave:
gtkwave tb_good_counter.vcd
- We can verify the counter operation using this waveform. You can also change the colour of individual signals for convenience.
- We will be using Yosys for the synthesis of our Verilog designs. It will convert our RTL design to a gate-level netlist.
- Yosys uses a synthesis script to read a design from a Verilog file, synthesizes it to a gate-level netlist using the cell library and writes the synthesized results as a Verilog netlist. The synthesis script will be written by the user on the terminal.
- We will consider the same example as mentioned above.
- To start Yosys just type
yosys
in the terminal. - We will start by reading our library files; this lets the synthesizer know what standard cells we are using:
- 'read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib`
- According to Yosys documentation:
- read_liberty: Read cells from liberty file as modules into current design.
- lib: Only create empty blackbox modules.
- Note that the address to the .lib file may differ.
- Now we will read our verilog file
read_verilog good_counter.v
- After reading the Verilog file, it will let you know if there are any errors or warnings.
- We will know synthesize the design:
synth -top good_counter
- According to Yosys documentation:
- synth: This command runs the default synthesis script. This command does not operate on partly selected designs.
- -top: Use the specified module as top module. -Now we will map the design to a specified cell library (in our case sky130 library).
abc -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
- According to Yosys documentation:
- abc: This pass uses the ABC tool for technology mapping of yosys's internal gate library to a target architecture.
- -liberty: generate netlists for the specified cell library (using the liberty file format).
- To view the resulting design we will simply type
show
.
- The various blocks present in the design are nothing but standard cells which are specified in the .lib file.
- One observation made during this workshop is that buffers are inserted in the design while synthesising combinational circuits, but that was not the case with sequential circuits. Here is an example of a mux:
-
I believe that this is due to the .lib file we are using.
-
Now we can write the netlist for the design. A netlist is a gate-level description of your design that specifies the components and their interconnections.
-
write_verilog -noattr good_counter_net.v
-
Use the -noattr to make the netlist easy to read. Here is a comparision between using and not using -noattr.
- sky130_fd_sc_hd__tt_025C_1v80.lib:
-sky130: 130nm technology
- tt : typical
- 025C : 25°C (operating temperature)
- 1v80 : operating voltage
- The .lib file is a collection of logical modules. It contains different 'flavours' of gates:
- slow
- fast
- They refer to the logic gate's speed (propagation delay). We utilize different gates depending on our setup and hold requirements. For meeting setup time, we require faster cells, but we need slower cells to meet hold time.
- Setup and hold time creates a window around the clock edge where the data cannot be toggled, or it can cause metastability (uncertainty).
-
In practical world setup analysis is used to improve performance and hold analysis is used to prevent data corruption.
-
What makes these cells fast and slow are:
- Charging/ Discharging of capacitance.
- Faster cells are wider, whereas slower cells are narrower.
- This also makes the faster cells occupy more area and consume more power.
-
We might also need to use slow cells due to the maximum operating frequency of the design. The maximum frequency is limited by the critical path (path whose propagation delay is the highest).
- PVT stands for Process, Voltage, Temperature. It is a very common to see these three things once you open your library file.
- You can look foe specific cells in the file by:
:/<keyword>
- Example: lets look for a a2111o cell. The function of this cell is ((a&b)|c|d|e).
:/a2111o
-
It will specify leakage power, value and delay for all 32 combinations.
-
We will now compare the different flavours of the same gate:
- We can observe that the leakage power and area increase (left to right). They are all and2 gates, but they all use a different width of transistors. As the width of the transistors increases, so does the area and power of the cell.
-
Sometimes a design contains multiple modules. These modules will have a hierarchy, with the top module being the one that calls the other modules.
-
Example: A full adder consisting 2 half adders. Here the half adder module will be instantiated within the full adder module twice.
-
Lets consider another example where we are tryning to implement this logic:
- The code is:
-
It consists of 2 sub-modules, one for OR operation and another for AND operation. Both of them are being instantiated in the top module.
-
We will now sythesise it using Yosys.
-
First read the library file to import standard cells.
-
Then read the verilog file.
-
synthesise the design :
synth -top multiple modules
-
This will show you all the submodules in your design along with your design hierarchy. In addition, we can observe that the report mentions which cells will be used in the sub-modules.
-
We will now map the standard cells to the design using
abc -liberty
. -
We are now ready to view the logic diagram:
- Compared to the logical circuit we designed earlier, we can observe that instead of AND and OR cells directly, Yosys has decided to preserve the hierarchy.
- When we write the netlist we can see that the hierarchy is preserved again.
- One interesting thing that can be obeserved from the netlist is that instead of using a OR gate, Yosys decided to use a NAND gate with inverted inputs.
- The reason for this is NOT that NAND gates have considerably less area and power consumption as seen in the .lib file since the area of inverters combined with that of NAND gates is greater than that of an OR gate.
-
The real reason to choose NAND gates over OR gates is due to the layout of PMOS inside the gates. In practical applications, we do not prefer PMOS in series. This is because the W/L ratio of PMOS is more than NMOS and the mobility of holes in PMOS is also less than the mobility of electrons in NMOS; this causes an increase in resistance and delay in switching. Therefore we prefer to keep PMOS in parallel.
-
OR gate:
-
NAND gate:
- In flat synthesis hierarchy is not present (preserved).
- To do flat synthesis simply type
flatten
after mapping standard cells. - View the logic diagram:
- We can compare from the previous diagram that the sub-modules are no longer present and have been replaced by standard cells.
- Sometimes we prefer sub-module level synthesis when we have too many instantiations of the same module or when we have a massive design with a large number of modules. In this case, we prefer to synthesise modules portion by portion so that the tool can write a more optimised netlist.
- We can do this in Yosys by simply using the command:
synth -top submodule1
. - This will synthesise only the specified module and not the entire design.
- A glitch is a small spike that happens at the output. It occurs due to the delays in gates.
- I have simulated the same combinational circuit discussed above, ie (a&b)|c, in modelsim with gates initialised with delays, here is the result:
- It can be seen that the output goes low even though the logic is equal to 1.
- This can be prevented by inserting a D-flip flop in between the combinational circuit as such:
- This keeps the output at the flop stable, keeping the rest of the circuit stable. This process is called shielding. Q is shielded from changes in d due to clk.
- We need to initialise the flop before use, or it will use garbage value.
- Initialising can be done by resetting it. There are two types of reset:
- Asynchronous reset
- Synchronous reset
- A flop can have either one or both types of reset, as shown in the code:
- A flop with Asynchronous reset can be reset at any time irrespevctive of clock edge, that is why we have put
(posedge clk, posedge reset)
in the sensitivity list. - A sensitivity list has all the inputs at which the always block will activate.
- After simulating the design using Icarus Verilog, the following results were observed:
- We can see that the reset is activating irrespective of the clock edge.
- For synthesising a designs with flops we need to map it using
dfflibmap -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
followed by mapping with abc command. - After synthesis we get the following logic diagram:
- A flop with synchronous reset only resets the flop when the reset is active at the clock edge. That is why the always block will only activate at the positive clock edge.
- In the above logic circuit we can see that the mux output will go low at reset = 1. But this reset will only be get in effect when the next positive clock edge comes.
- After simulation using Icarus Verilog we observe the following:
- We can see that the effect of reset is only reflected in the output at the next positive clock edge.
- A flop with both asynchronous and synchronous resets will reset at both the conditions.
- We will work with the following codes for this section:
- If we look at the first code, it is simply taking a 3 bit input and multiplying it with 2 to get a 4 bit output. The truth table is as followed:
- We can observe that the output has shifted to the left by one bit. This simplifies the entire logic circuits to just a few wires.
- After
synth -top mult_2
command in Yosys we can observe that there are no cells:
- After synthesis we get the following logic diagram and net list:
- In the second code we are multiplying the input with 9. This a similar case were we dont have to use any logic components for its synthesis.
- It is equivalent to:
- After synthesis we get the following logic diagram and netlist:
- We can see that the required components have been reduced to just some wires again.
- Under optimisation, we try to reduce the components/ size of the logic circuit as much as possible to get the most optimised design
- This results in area and power consumption reduction.
- There are two types of optimisations:
- Combinational logic optimisation
- Sequential logic optimisation
- Some techniques used to optimise a combinational circuits are:
- Constant Propogation
- Boolean Logic Optimisation
- Consider the following logic: Y = ((AB)+C)', where A is grounded.
- The logic circuit of the following can be reduced as follows:
-This reduces the numbers of transistors in our design from 6 to 2 MOSFETs.
- Cosider the following code:
- It is a code for a multiplexer circuit, lets see how to optimize it.
- After
synth -top opt_check3
use the commandopt_clean -purge
to optimize your design - According to Yosys documentation:
- opt_clean : This pass identifies wires and cells that are unused and removes them. Other passes often remove cells but leave the wires in the design or reconnect the wires but leave the old cells in the design. This pass can be used to clean up after the passes that do the actual work.
- -purge : also remove internal nets if they have a public name
- Here is a comparison between a non-optimised netlist and an optimised netlist:
- Consider the following boolean logic:
a?(b?c:(c?a:0)):(!c)
- It is one of many ways to design a multiplexer circuit in Verilog.
- As you can see that the above circuit can be reduced to simple XOR gate.
- Some Techniques used to optimise Sequential logic are:
- Sequential constant propogation
- State optimisation
- Retiming
- Clonning
- Consider the following circuit:
- We can observe that no matter what value of reset or clock is given, the output will always be 1. This makes our entire circuit redundant.
- It refers to the optimisation of unused states. We try to implement a state machine with the least number of states possible.
- It is the technique of shifting logic around to improve your maximum operational frequency.
- Consider the following circuit:
- Assume we have ample positive slack at A, which means that data can reach A a little later, and flop C does not meet setup time due to the large distance from A. To solve this issue, we can duplicate (clone) A to meet the timing of C as shown below:
- Gate level simulation or GLS is the simulation of the gate-level netlist obtained from Yosys. This simulation is done using iverilog.
- In addition to the netlist, we will also pass the standard cell libraries and testbench to iverilog.
- By doing this, we are making sure we don't have any Synthesis-Simulation mismatch, i.e. our RTL simulation should give the same results as our GLS.
- Synthesis-Simulation mismatch can happen due to the following reasons:
- Incomplete sensitivity list
- Blocking and non-blocking statements
- We will explore this by taking the following example:
- In the code bad_mux.v, the sensitivity list only consists of
sel
. It means that the output will only change aftersel
has changed. - This behavior can be observed in the RTL simulation:
- We can observe that the output is not changing when the input changes. This is incorrect behavior.
- Now lets compare these results to GLS.
- First write a net-list using Yosys.
- We will now use iverilog to conduct GLS. Type the command:
iverilog ../my_lib/verilog_model/primitives.v ../my_lib/verilog_model/sky130_fd_sc_hd.v bad_mux_net.v tb_ bad_mux.v
- From here, the procedure for gtkwave is the same. It will produce the following result:
- We can observe that this is the correct behaviour of a 2:1 mux. This is correct because the gate-level netlist is not concerned with the sensitivity list but only with the interconnections of various components.
- In the other two codes, we will not get any synthesis-simulation mismatch.
- Here is a comparision between the netlist of good and bad mux:
- We can see that they are the same, thus proving that the netlist does not care about an incomplete sensitivity list.
- It should also be noted that Yosys will try to warn you if your design has an incomplete sensitivity list.
- Blocking: represented by
=
. They execute statements in the order they are written. - Non-Blocking: represented by
<=
. They execute all the RHS when always block is entered and assigns to LHS. - These come into picture when using always block. -Let us consider the following code to understand the significance of blocking and non-blocking statements:
- Here is the logical circuit:
- We can see that the
x&c
is assigned toy
before the value ofx
is resolved. Thus it will look at the previous value ofx
, which is the behaviour of a latch. This is undesirable in combinational circuits. Let us look at the simulation result:
- We can see the output is considering the past value of
x
to get its result. - This gets resolved when we conduct GLS:
- The results are now appearing properly.
- If statements in Verilog work similarly to if statements in any other language. They have priority, with the top one having the highest priority.
- If statements in Verilog must be placed inside always block.
- If statements are often used to realise multiplexer circuits. But we must be careful as incomplete if statements give rise to infered latches.
- Take the following code to understand this problem:
- The select line defines only one input (when select is 1) in the first code. This leaves the other input (when select is 0) undefined. Therefore the synthesis tool considers it as preserving the previous value of output.
- When simulated we see the following result:
- It is observed that when the select line (i0) goes low, the output latches to its previous value.
- Even the synthesis report lets us know that there is a latch present in the design:
- A latch is also observed in the logic diagram:
- In the second code, a similar occurence takes place. Here is what the code is trying to implement:
- The simulation results are:
- It is observed that when both select lines go low, the output will latch to its previous value.
- After synthesis we get this logic diagram:
- For better understanding, here is the logic gate diagram of the same:
- Case statements in Verilog are placed inside always block and must have an output register type variable.
- One must be cautious when using case statements:
- Incomplete cases cause inferred latches.
- Partial assignments also cause inferred latches.
- Overlapping cases give rise to unpredictable behaviour.
- Default statement does not guarantee the absence of inferred latches.
- Consider the following codes:
- In the first code, there is an attempt at creating a mux, But the code failed to define all possible values of
sel
. - From the simulation it is observed that the output latches at sel = 10 (undefined):
- Synthesis result also show a latch:
- Here is a simplified version of the same:
- In the second code we have fixed this issue by simply adding a default statement, which will handle any undefined case.
- Proper operation is verified from the simulation:
- At sel = 10, the output follows
i2
as defined by the default case. - Synthesis results in the following:
- Here is a simplified version of the same:
- The following code contains partial assignment:
- In case sel = 01, it has failed to give value to
x
. - The simulation results are similar to incomplete case, here only
x
gets latched to its previous output at sel = 01. This is an example of 'default case do not prevent inferred latches'. - The synthesis produces the following:
- We can see in the reports that a latch has been generated.
- Here is a simplified version of the logic diagram:
- Consider the following code:
- Here at sel = 10, it satisfies the 3rd and 4th cases. This overlap is undesirable and causes unpredictable behaviour as shown:
- Note that this is not the behaviour of an inferred latch as shown in the synthesis report (zero latches):
- This is fixed by conducting a GLS:
- In verilog we use two types of loops:
- for loop : Used for evaluating expressions.
- for generate : Used to instatiate hardware.
- Consider the following code:
- It uses the for loop to create a 4:1 mux. As we have seen previously, the same thing can be made using if statements, case statements, and boolean expressions. This method is beneficial in creating larger order mux, e.g.: 32:1 mux.
- Simulation result:
- An interesting thing was observed in the synthesis, the design had a latch in it:
-As seen in the logic diagram, this latch is not doing anything since it is always enabled. This might be because we have defined the output as a reg.
- The following code shows the difference between using case statements and and for loop for a design.
- It is observed that the for loop will always take 3 lines (for loop) to make a demux of any size. But when using case we will have to define every condition.
- Simulation:
- Synthesis:
- We can see that there were no latches in this design unlike mux_generate.v. We did not use reg type outputs. Therefore we can confidently conclude that the latch in mux_gen was due to reg type output.
- Learned about RTL design and how to use iverilog and Yosys to simulate and synthesise these designs. The most exciting part of the workshop was playing around with the .lib file and learning about the various parameters mentioned in it. In addition, I learned different optimisation techniques for both combinational and sequential logic. Learned how GLS works and how to use for constructs.
- Kunal Ghosh, Co-founder, VSD Corp. Pvt. Ltd.
- Shon Taware, RTL Design Engineer at Chipspirit Technologies
- And all the people who helped organise this workshop.