Skip to content

2001Arpit/RTL-Design-With-SKY130

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 

Repository files navigation

RTL-Design-With-SKY130

image

Overview

This workshop aims to teach

  • Verilog coding guidelines
  • Synthesizable Verilog codes
  • Synthesis
  • Optimizations
  • Synthesis-Simulation mismatch
  • If statements
  • For loop and For generate.

We will achieve the above using iverilog for simulations and yosys for synthesis. In addition, we will be using sky130 standard cell libraries. It will tell our synthesis tool which component to use for our design. This library contains details like power consumption, delays, area, etc.

Table of Contents

Day 1

Day 2

Day 3

Day 4

Day 5

Introduction

RTL (Register Transfer Level) Design

Transistor-level design connects transistors into circuits to build gates or other components. Combinational or sequential circuits whose building blocks are primarily logic gates are thus called logic-level design. Designing circuits whose building blocks are registers and other datapath components consists of transferring data from registers through other datapath components like adders and back to registers. Such design is thus called register-transfer level design or RTL design.

Icarus Verilog

  • iverilog or Icarus Verilog is a simulation and synthesis tool. It should be noted that we will be using it as a simulation tool only and do the synthesis using yosys. We will be viewing the simulation results using a waveform viewer known as gtkwave. It uses a .vcd file to produce the waveform. VCD stands for value change dump. It is a dump file that the gtkwave uses for simulation.
  • A testbench provides these 'changes' in values. A testbench is a Verilog program that checks the functionality of our design by giving various possible inputs to the design.

image

  • Lets take an example of a counter which counts from 0 to 2:

counter program and test bench

  • To run these files using iverilog, use the following command: iverilog good_counter.v tb_good_counter.v.
  • If there were no errors, this will create a 'a.out' file in your current working directory. 'a.out' is an output file. Running this file will create .vcd file which will be used for simulation. To run the 'a.out' file use the following command: ./a.out or vvp a.out.
  • Note that the default .vcd file name will be tb_.vcd.
  • Once the .vcd file has been generated, we can finally view the output using gtkwave: gtkwave tb_good_counter.vcd

gtkwave counter

  • We can verify the counter operation using this waveform. You can also change the colour of individual signals for convenience.

Yosys

  • We will be using Yosys for the synthesis of our Verilog designs. It will convert our RTL design to a gate-level netlist.
  • Yosys uses a synthesis script to read a design from a Verilog file, synthesizes it to a gate-level netlist using the cell library and writes the synthesized results as a Verilog netlist. The synthesis script will be written by the user on the terminal.
  • We will consider the same example as mentioned above.
  • To start Yosys just type yosys in the terminal.
  • We will start by reading our library files; this lets the synthesizer know what standard cells we are using:
  • 'read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib`

read_liberty

  • According to Yosys documentation:
    • read_liberty: Read cells from liberty file as modules into current design.
    • lib: Only create empty blackbox modules.
  • Note that the address to the .lib file may differ.
  • Now we will read our verilog file
  • read_verilog good_counter.v
  • After reading the Verilog file, it will let you know if there are any errors or warnings.
  • We will know synthesize the design:
  • synth -top good_counter
  • According to Yosys documentation:
    • synth: This command runs the default synthesis script. This command does not operate on partly selected designs.
    • -top: Use the specified module as top module. -Now we will map the design to a specified cell library (in our case sky130 library).
  • abc -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
  • According to Yosys documentation:
    • abc: This pass uses the ABC tool for technology mapping of yosys's internal gate library to a target architecture.
    • -liberty: generate netlists for the specified cell library (using the liberty file format).
  • To view the resulting design we will simply type show.

counter show

  • The various blocks present in the design are nothing but standard cells which are specified in the .lib file.
  • One observation made during this workshop is that buffers are inserted in the design while synthesising combinational circuits, but that was not the case with sequential circuits. Here is an example of a mux:

mux show

  • I believe that this is due to the .lib file we are using.

  • Now we can write the netlist for the design. A netlist is a gate-level description of your design that specifies the components and their interconnections.

  • write_verilog -noattr good_counter_net.v

  • Use the -noattr to make the netlist easy to read. Here is a comparision between using and not using -noattr.

mux write verilog noattr mux write verilog

Standard cell Library

  • sky130_fd_sc_hd__tt_025C_1v80.lib: -sky130: 130nm technology
    • tt : typical
    • 025C : 25°C (operating temperature)
    • 1v80 : operating voltage

Different Flavours of cells

  • The .lib file is a collection of logical modules. It contains different 'flavours' of gates:
    • slow
    • fast
  • They refer to the logic gate's speed (propagation delay). We utilize different gates depending on our setup and hold requirements. For meeting setup time, we require faster cells, but we need slower cells to meet hold time.
  • Setup and hold time creates a window around the clock edge where the data cannot be toggled, or it can cause metastability (uncertainty).

image

  • In practical world setup analysis is used to improve performance and hold analysis is used to prevent data corruption.

  • What makes these cells fast and slow are:

    • Charging/ Discharging of capacitance.
    • Faster cells are wider, whereas slower cells are narrower.
    • This also makes the faster cells occupy more area and consume more power.
  • We might also need to use slow cells due to the maximum operating frequency of the design. The maximum frequency is limited by the critical path (path whose propagation delay is the highest).

PVT

  • PVT stands for Process, Voltage, Temperature. It is a very common to see these three things once you open your library file.

pvt

  • You can look foe specific cells in the file by: :/<keyword>
  • Example: lets look for a a2111o cell. The function of this cell is ((a&b)|c|d|e).
  • :/a2111o

a2111o

  • It will specify leakage power, value and delay for all 32 combinations.

  • We will now compare the different flavours of the same gate:

comparisiom

  • We can observe that the leakage power and area increase (left to right). They are all and2 gates, but they all use a different width of transistors. As the width of the transistors increases, so does the area and power of the cell.

Synthesis

Hierarchical Synthesis

  • Sometimes a design contains multiple modules. These modules will have a hierarchy, with the top module being the one that calls the other modules.

  • Example: A full adder consisting 2 half adders. Here the half adder module will be instantiated within the full adder module twice.

  • Lets consider another example where we are tryning to implement this logic:

comb

  • The code is:

multiple module code

  • It consists of 2 sub-modules, one for OR operation and another for AND operation. Both of them are being instantiated in the top module.

  • We will now sythesise it using Yosys.

  • First read the library file to import standard cells.

  • Then read the verilog file.

  • synthesise the design : synth -top multiple modules

multiple module synth

multiple module synth2

  • This will show you all the submodules in your design along with your design hierarchy. In addition, we can observe that the report mentions which cells will be used in the sub-modules.

  • We will now map the standard cells to the design using abc -liberty.

  • We are now ready to view the logic diagram:

multiple modules show

  • Compared to the logical circuit we designed earlier, we can observe that instead of AND and OR cells directly, Yosys has decided to preserve the hierarchy.
  • When we write the netlist we can see that the hierarchy is preserved again.
  • One interesting thing that can be obeserved from the netlist is that instead of using a OR gate, Yosys decided to use a NAND gate with inverted inputs.

multiple module write verilog2

  • The reason for this is NOT that NAND gates have considerably less area and power consumption as seen in the .lib file since the area of inverters combined with that of NAND gates is greater than that of an OR gate.

comparisiom2

  • The real reason to choose NAND gates over OR gates is due to the layout of PMOS inside the gates. In practical applications, we do not prefer PMOS in series. This is because the W/L ratio of PMOS is more than NMOS and the mobility of holes in PMOS is also less than the mobility of electrons in NMOS; this causes an increase in resistance and delay in switching. Therefore we prefer to keep PMOS in parallel.

  • OR gate:

  • or-gate-cmos

  • NAND gate:

  • nand

Flat Synthesis

  • In flat synthesis hierarchy is not present (preserved).
  • To do flat synthesis simply type flatten after mapping standard cells.
  • View the logic diagram:

multiple modules flat show

  • We can compare from the previous diagram that the sub-modules are no longer present and have been replaced by standard cells.

Sub-module Level Synthesis

  • Sometimes we prefer sub-module level synthesis when we have too many instantiations of the same module or when we have a massive design with a large number of modules. In this case, we prefer to synthesise modules portion by portion so that the tool can write a more optimised netlist.
  • We can do this in Yosys by simply using the command: synth -top submodule1.
  • This will synthesise only the specified module and not the entire design.

Flops

Glitch

  • A glitch is a small spike that happens at the output. It occurs due to the delays in gates.
  • I have simulated the same combinational circuit discussed above, ie (a&b)|c, in modelsim with gates initialised with delays, here is the result:

glitch code glitch

  • It can be seen that the output goes low even though the logic is equal to 1.
  • This can be prevented by inserting a D-flip flop in between the combinational circuit as such:

shield

  • This keeps the output at the flop stable, keeping the rest of the circuit stable. This process is called shielding. Q is shielded from changes in d due to clk.

Flop coding

  • We need to initialise the flop before use, or it will use garbage value.
  • Initialising can be done by resetting it. There are two types of reset:
    • Asynchronous reset
    • Synchronous reset
  • A flop can have either one or both types of reset, as shown in the code:

all reset

  • A flop with Asynchronous reset can be reset at any time irrespevctive of clock edge, that is why we have put (posedge clk, posedge reset) in the sensitivity list.
  • A sensitivity list has all the inputs at which the always block will activate.

reset async

  • After simulating the design using Icarus Verilog, the following results were observed:

reset async in action

  • We can see that the reset is activating irrespective of the clock edge.
  • For synthesising a designs with flops we need to map it using dfflibmap -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib followed by mapping with abc command.
  • After synthesis we get the following logic diagram:

reset async show

  • A flop with synchronous reset only resets the flop when the reset is active at the clock edge. That is why the always block will only activate at the positive clock edge.

reset sync

  • In the above logic circuit we can see that the mux output will go low at reset = 1. But this reset will only be get in effect when the next positive clock edge comes.
  • After simulation using Icarus Verilog we observe the following:

reset sync in action

  • We can see that the effect of reset is only reflected in the output at the next positive clock edge.
  • A flop with both asynchronous and synchronous resets will reset at both the conditions.

reset sync async

Special Optimizations

  • We will work with the following codes for this section:

mult code

  • If we look at the first code, it is simply taking a 3 bit input and multiplying it with 2 to get a 4 bit output. The truth table is as followed:

TRUTH

  • We can observe that the output has shifted to the left by one bit. This simplifies the entire logic circuits to just a few wires.
  • After synth -top mult_2 command in Yosys we can observe that there are no cells:

mult2 synth

  • After synthesis we get the following logic diagram and net list:

mult2 show

mult2 net

  • In the second code we are multiplying the input with 9. This a similar case were we dont have to use any logic components for its synthesis.
  • It is equivalent to:

mult9

  • After synthesis we get the following logic diagram and netlist:

mult9 show

mult9 net

  • We can see that the required components have been reduced to just some wires again.

Optimisations

  • Under optimisation, we try to reduce the components/ size of the logic circuit as much as possible to get the most optimised design
  • This results in area and power consumption reduction.
  • There are two types of optimisations:
    • Combinational logic optimisation
    • Sequential logic optimisation

Combinational Logic Optimisation

  • Some techniques used to optimise a combinational circuits are:
    • Constant Propogation
    • Boolean Logic Optimisation

Constant Propogation:

  • Consider the following logic: Y = ((AB)+C)', where A is grounded.
  • The logic circuit of the following can be reduced as follows:

gates opt eg

-This reduces the numbers of transistors in our design from 6 to 2 MOSFETs.

  • Cosider the following code:

check3 code

  • It is a code for a multiplexer circuit, lets see how to optimize it.
  • After synth -top opt_check3 use the command opt_clean -purge to optimize your design
  • According to Yosys documentation:
    • opt_clean : This pass identifies wires and cells that are unused and removes them. Other passes often remove cells but leave the wires in the design or reconnect the wires but leave the old cells in the design. This pass can be used to clean up after the passes that do the actual work.
    • -purge : also remove internal nets if they have a public name
  • Here is a comparison between a non-optimised netlist and an optimised netlist:

check3 no_opt netlist check3 opt netlist

Boolean Logic Optimisation

  • Consider the following boolean logic: a?(b?c:(c?a:0)):(!c)
  • It is one of many ways to design a multiplexer circuit in Verilog.

mux comb opt

  • As you can see that the above circuit can be reduced to simple XOR gate.

Sequential Logic Optimisation

  • Some Techniques used to optimise Sequential logic are:
    • Sequential constant propogation
    • State optimisation
    • Retiming
    • Clonning

Sequential Constant Propogation

  • Consider the following circuit:

seq

  • We can observe that no matter what value of reset or clock is given, the output will always be 1. This makes our entire circuit redundant.

State Optimisation

  • It refers to the optimisation of unused states. We try to implement a state machine with the least number of states possible.

Retiming

  • It is the technique of shifting logic around to improve your maximum operational frequency.

Cloning

  • Consider the following circuit:

state opt

  • Assume we have ample positive slack at A, which means that data can reach A a little later, and flop C does not meet setup time due to the large distance from A. To solve this issue, we can duplicate (clone) A to meet the timing of C as shown below:

state opt2

Gate Level Simulation

  • Gate level simulation or GLS is the simulation of the gate-level netlist obtained from Yosys. This simulation is done using iverilog.
  • In addition to the netlist, we will also pass the standard cell libraries and testbench to iverilog.
  • By doing this, we are making sure we don't have any Synthesis-Simulation mismatch, i.e. our RTL simulation should give the same results as our GLS.
  • Synthesis-Simulation mismatch can happen due to the following reasons:
    • Incomplete sensitivity list
    • Blocking and non-blocking statements

Incomplete sensitivity list

  • We will explore this by taking the following example:

mux codes

  • In the code bad_mux.v, the sensitivity list only consists of sel. It means that the output will only change after sel has changed.
  • This behavior can be observed in the RTL simulation:

bad mux rtl simulation

  • We can observe that the output is not changing when the input changes. This is incorrect behavior.
  • Now lets compare these results to GLS.
  • First write a net-list using Yosys.
  • We will now use iverilog to conduct GLS. Type the command:
  • iverilog ../my_lib/verilog_model/primitives.v ../my_lib/verilog_model/sky130_fd_sc_hd.v bad_mux_net.v tb_ bad_mux.v
  • From here, the procedure for gtkwave is the same. It will produce the following result:

bad mux gls simulation

  • We can observe that this is the correct behaviour of a 2:1 mux. This is correct because the gate-level netlist is not concerned with the sensitivity list but only with the interconnections of various components.
  • In the other two codes, we will not get any synthesis-simulation mismatch.
  • Here is a comparision between the netlist of good and bad mux:

good mux net bad mux net

  • We can see that they are the same, thus proving that the netlist does not care about an incomplete sensitivity list.
  • It should also be noted that Yosys will try to warn you if your design has an incomplete sensitivity list.

Blocking and Non-Blocking Statements

  • Blocking: represented by =. They execute statements in the order they are written.
  • Non-Blocking: represented by <=. They execute all the RHS when always block is entered and assigns to LHS.
  • These come into picture when using always block. -Let us consider the following code to understand the significance of blocking and non-blocking statements:

Screenshot (2)

  • Here is the logical circuit:

blocking caveat diagram

  • We can see that the x&c is assigned to y before the value of x is resolved. Thus it will look at the previous value of x, which is the behaviour of a latch. This is undesirable in combinational circuits. Let us look at the simulation result:

blocking caveat gtk

  • We can see the output is considering the past value of x to get its result.
  • This gets resolved when we conduct GLS:

blocking caveat gls gtk

  • The results are now appearing properly.

If Statements

  • If statements in Verilog work similarly to if statements in any other language. They have priority, with the top one having the highest priority.
  • If statements in Verilog must be placed inside always block.
  • If statements are often used to realise multiplexer circuits. But we must be careful as incomplete if statements give rise to infered latches.
  • Take the following code to understand this problem:

incomplete if code

  • The select line defines only one input (when select is 1) in the first code. This leaves the other input (when select is 0) undefined. Therefore the synthesis tool considers it as preserving the previous value of output.
  • When simulated we see the following result:

incomplete if gtk

  • It is observed that when the select line (i0) goes low, the output latches to its previous value.
  • Even the synthesis report lets us know that there is a latch present in the design:

incomplete if synth

  • A latch is also observed in the logic diagram:

incomplete if show

  • In the second code, a similar occurence takes place. Here is what the code is trying to implement:

incomplete if2 circuit

  • The simulation results are:

incomplete if2 gtk latching

  • It is observed that when both select lines go low, the output will latch to its previous value.
  • After synthesis we get this logic diagram:

incomplete if2 show

  • For better understanding, here is the logic gate diagram of the same:

incomplete if2 show circuit

Case Statements

  • Case statements in Verilog are placed inside always block and must have an output register type variable.
  • One must be cautious when using case statements:
    • Incomplete cases cause inferred latches.
    • Partial assignments also cause inferred latches.
    • Overlapping cases give rise to unpredictable behaviour.
    • Default statement does not guarantee the absence of inferred latches.

Incomplete Case

  • Consider the following codes:

cases cose

  • In the first code, there is an attempt at creating a mux, But the code failed to define all possible values of sel.
  • From the simulation it is observed that the output latches at sel = 10 (undefined):

incomplete case gtk latching

  • Synthesis result also show a latch:

incomplete case show

  • Here is a simplified version of the same:

incomplete case circuit

  • In the second code we have fixed this issue by simply adding a default statement, which will handle any undefined case.
  • Proper operation is verified from the simulation:

complete case gtk no latching

  • At sel = 10, the output follows i2 as defined by the default case.
  • Synthesis results in the following:

complete case show

  • Here is a simplified version of the same:

complete case circuit

Partial Assignment

  • The following code contains partial assignment:

partial case code

  • In case sel = 01, it has failed to give value to x.
  • The simulation results are similar to incomplete case, here only x gets latched to its previous output at sel = 01. This is an example of 'default case do not prevent inferred latches'.
  • The synthesis produces the following:

partial case synth partial case show

  • We can see in the reports that a latch has been generated.
  • Here is a simplified version of the logic diagram:

partial case circuit

Overlapping cases

  • Consider the following code:

bad case code

  • Here at sel = 10, it satisfies the 3rd and 4th cases. This overlap is undesirable and causes unpredictable behaviour as shown:

bad case gtk

  • Note that this is not the behaviour of an inferred latch as shown in the synthesis report (zero latches):

bad case synth

  • This is fixed by conducting a GLS:

bad case gls gtk

Loops

  • In verilog we use two types of loops:
    • for loop : Used for evaluating expressions.
    • for generate : Used to instatiate hardware.

For Loops

  • Consider the following code:

mux gen code

  • It uses the for loop to create a 4:1 mux. As we have seen previously, the same thing can be made using if statements, case statements, and boolean expressions. This method is beneficial in creating larger order mux, e.g.: 32:1 mux.
  • Simulation result:

mux gen gtk

  • An interesting thing was observed in the synthesis, the design had a latch in it:

mux gen synth mux gen show

-As seen in the logic diagram, this latch is not doing anything since it is always enabled. This might be because we have defined the output as a reg.

  • The following code shows the difference between using case statements and and for loop for a design.

demux codes

  • It is observed that the for loop will always take 3 lines (for loop) to make a demux of any size. But when using case we will have to define every condition.
  • Simulation:

demux case gtk

  • Synthesis:

demux case synth demux case show

  • We can see that there were no latches in this design unlike mux_generate.v. We did not use reg type outputs. Therefore we can confidently conclude that the latch in mux_gen was due to reg type output.

Conlusion

  • Learned about RTL design and how to use iverilog and Yosys to simulate and synthesise these designs. The most exciting part of the workshop was playing around with the .lib file and learning about the various parameters mentioned in it. In addition, I learned different optimisation techniques for both combinational and sequential logic. Learned how GLS works and how to use for constructs.

Acknowledgements

  • Kunal Ghosh, Co-founder, VSD Corp. Pvt. Ltd.
  • Shon Taware, RTL Design Engineer at Chipspirit Technologies
  • And all the people who helped organise this workshop.

About

RTL design workshop organized by Kunal Ghosh

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published