Skip to content

Commit

Permalink
added synthesis chap
Browse files Browse the repository at this point in the history
  • Loading branch information
franout committed Sep 14, 2020
1 parent 52bc448 commit a1f681a
Show file tree
Hide file tree
Showing 46 changed files with 583 additions and 122 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
27 changes: 27 additions & 0 deletions report/appendices/appendix14.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
%%% Appendix A
\chapter{UVM classes}
\label{appendix14}

\lstinputlisting[style=sv,language=Verilog, breaklines=true]{../hardware/dlx/test_bench/uvm_class_def/dlx_sequence_item.sv}


\lstinputlisting[style=sv,language=Verilog, breaklines=true]{../hardware/dlx/test_bench/uvm_class_def/dlx_sequence.sv}


\lstinputlisting[style=sv,language=Verilog, breaklines=true]{../hardware/dlx/test_bench/uvm_class_def/dlx_sequencer.sv}


\lstinputlisting[style=sv,language=Verilog, breaklines=true]{../hardware/dlx/test_bench/uvm_class_def/dlx_driver.sv}


\lstinputlisting[style=sv,language=Verilog, breaklines=true]{../hardware/dlx/test_bench/uvm_class_def/dlx_monitor.sv}


\lstinputlisting[style=sv,language=Verilog, breaklines=true]{../hardware/dlx/test_bench/uvm_class_def/dlx_env.sv}

\lstinputlisting[style=sv,language=Verilog, breaklines=true]{../hardware/dlx/test_bench/uvm_class_def/dlx_scoreboard.sv}

\lstinputlisting[style=sv,language=Verilog, breaklines=true]{../hardware/dlx/test_bench/uvm_class_def/dlx_test.sv}
% \lstinputlisting is an alternative way to import text or code from an external file. In this example the behavioural VHDL description of an adder contained in the file adder.vhd is imported.
% Note that you can set the language of the code that you want to import (VHDL in this example). When you set the language you will see the keywords of that specific language highlighted in your output pdf file.
%You can set a lot parameters: for some examples take a look at the chapter 'How to document the project' that can you find in DLX_Project.pdf.
2 changes: 1 addition & 1 deletion report/chapters/architecture.tex
Original file line number Diff line number Diff line change
Expand Up @@ -68,5 +68,5 @@ \subsection{ALU Multiplier}
\label{fig:alumul}
\end{figure}
\footnotetext[1]{Registers are in red}
As it can be seen from Figure \ref{fig:alumulb}, the previously developed unit has been enhanced pipelining it in order to have a 8 stage ( 6 inside the multiplier and 2 outside the unit, the A and B register and the ALU output register) multiplier, reducing the critical path and increasing the possible achieavable perfomances. It is worth to mention that since the unit is pipeline it can potentially execute in parallel up to 8 multiplication, one ofter the other without any data hazards and with a proper control unit to handle the specifici situation.\\
As it can be seen from Figure \ref{fig:alumulb}\footnote{The Figure represents a 8-bit multiplier. However, the approach is the same, only the internal stages of the multiplier have been pipelined.}, the previously developed unit has been enhanced pipelining it in order to have a 8 stage ( 6 inside the multiplier and 2 outside the unit, the A and B register and the ALU output register) multiplier, reducing the critical path and increasing the possible achieavable perfomances. It is worth to mention that since the unit is pipeline it can potentially execute in parallel up to 8 multiplication, one ofter the other without any data hazards and with a proper control unit to handle the specifici situation.\\
Moreover, the compiler script has been also modified for adding the integer multiplication and the assigned opcode can be seen in Appendix \ref{appendix8}.
Binary file modified report/chapters/figures/mult_pip.PNG
Binary file added report/chapters/files/1latency_area_power.xlsx
Binary file not shown.
Binary file added report/chapters/files/latency_area.pdf
Binary file not shown.
Binary file added report/chapters/files/latency_area_power.png
2 changes: 1 addition & 1 deletion report/chapters/functional_vt.tex
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ \section{Universal Verification Methodology}
\label{fig:tbuvm}
\end{figure}

Each class in Figure \ref{fig:tbuvm} inherits the functions and tasks of the relative UVM related class. For this specific case, the UVM classes function are:
Each class in Figure \ref{fig:tbuvm} inherits the functions and tasks of the relative UVM related class. For this specific case, the UVM classes function are (see Appendix \ref{}):
\begin{itemize}
\item Instruction item, it is the basic instruction for the microprocessor plus some additional function, such as its conversion to string, the retrieve of the only opcode and/or the opcode ALU function. It is important to mention that it includes the random variables for the opcode, opcode alu function, r\textsubscript{d}, rs\textsubscript{1}, rs\textsubscript{2} , immediate and jump address(which are randomized when calling the relative function in instruction sequence) plus the constraint on the register, such as the one that the r\textsubscript{0} cannot be a destination register. Moreover, depening on the current opcode, it composes the instruction accordingly, i.e. the jump address is not needed when composing a add instruction and viceversa, even if all the variables are randomized everytime.
\item Instruction sequence, it creates a given number of random instruction item.
Expand Down
72 changes: 31 additions & 41 deletions report/chapters/physical_design.tex
Original file line number Diff line number Diff line change
Expand Up @@ -3,62 +3,52 @@ \chapter{Physical Design}
The physical design of the unit has been achieved by the usage of a script (Appendix \ref{appendix12}). As for the synthesis, this script is in charge of preparing the environment and start the real script for the physical design.\\ This step is very well known as a computing intensive step, even more then the synthesis since the granularity of details is bigger than the one in the synthesis. Therefore, for boosting-up the performance the script automatically sets the usage of six threads instead of only one.\\\\
Nevertheless the variety in the design space, only a subset of them has been choosen for being placed on a die. In particular the unconstrained desing, the minimum area and a 10\% less of the clock frequency and only the 10\% less of the clock frequency designs. It is worth to mention that the physical design does not use the RTL description but it uses the gate-level netlist of the RTL, produced by the synthesis.\\
\section{Results}
In the following pages, results are presented as images of the design on the same die.
In the following pages, results are presented as images of the design on the same die.\\
As first result of physical design, it is worth to present the ameba view. It distinghuis between the control unit and the datapath of the DLX. As it can be seen from Figure \ref{fig:amebano} to Figure \ref{fig:ameba1minarea}, the area occupied by the control unit is always the same.\\\\
Moreover, from Figure \ref{fig:ameba} and Figure \ref{fig:place} it may seem that the pins are overlapping. However, this is not the case, it is only a matter of image resolution. Specifically, the pins on the top and right side (respectively for the IRAM and DRAM) are overlapped but on different level of the die. On the other hand, all the control signals from/to memories, clock and reset signals are on the bottom of the left corner.\\\\
An important aspect is how the datapath area is changing, having in mind the boundaries of datapath from Figures \ref{fig:ameba} and looking at Figures \ref{fig:place}. From A to C there is a consistent area reduction, the area is reducing by a factor of almost 2. Comparing the Figure \ref{fig:placno} with the Figure \ref{fig:plac10}, even if there is a slight increase in the clock frequency, the synthesis strategies are different (it goes from a naive synthesis to a synthesis in which optimization efforts are done in order to reduce design metrics). The only difference with Figure \ref{fig:plac1minarea} is the constraint on finding the design with the minimun area.

\begin{figure}[!htbp]
\centering
\begin{subfigure}[b]{0.4\linewidth}
\includegraphics[width=\linewidth,scale=0.5,angle=0]{../project/physical_design/images_nopt/DLX_IR_SIZE32_PC_SIZE32_nopt_ameba_prerouting.jpg}
\caption{Ameba view of unconstrained design}
\includegraphics[width=\linewidth,scale=0.6,angle=0]{../project/physical_design/images_nopt/DLX_IR_SIZE32_PC_SIZE32_nopt_ameba_prerouting.jpg}
\caption{Unconstrained design}
\label{fig:amebano}
\end{subfigure}
\begin{subfigure}[b]{0.4\linewidth}
\includegraphics[width=\linewidth,scale=0.5,angle=0]{../project/physical_design/images_10/DLX_IR_SIZE32_PC_SIZE32_10_ameba_prerouting.jpg}
\caption{Ameba view of design with 10\% less on clock frequency}
\includegraphics[width=\linewidth,scale=0.6,angle=0]{../project/physical_design/images_10/DLX_IR_SIZE32_PC_SIZE32_10_ameba_prerouting.jpg}
\caption{10\% more on clock frequency}
\label{fig:ameba10}
\end{subfigure}

\label{fig:ameba}
\end{figure}


\begin{figure}[!htbp]
\centering
\captionsetup{justification=centering}
\includegraphics[scale=0.5,angle=0]{../project/physical_design/images_1_minarea/DLX_IR_SIZE32_PC_SIZE32_1_minarea_ameba_prerouting.jpg}
\caption{Ameba view of design with 1\% less on clock frequency and minimum area}

\begin{subfigure}[b]{0.4\linewidth}
\includegraphics[width=1\linewidth,scale=0.6,angle=0]{../project/physical_design/images_1_minarea/DLX_IR_SIZE32_PC_SIZE32_1_minarea_ameba_prerouting.jpg}
\caption{1\% more on clock frequency and minimum area}
\label{fig:ameba1minarea}
\end{subfigure}
\caption{Ameba view}
\label{fig:ameba}
\end{figure}


\begin{figure}[!htbp]
\centering
\captionsetup{justification=centering}
\includegraphics[scale=0.5,angle=0]{../project/physical_design/images_nopt/DLX_IR_SIZE32_PC_SIZE32_nopt_place_prerouting.jpg}
\caption{Placement of unconstrained design}
\centering
\begin{subfigure}[b]{0.4\linewidth}
\includegraphics[width=\linewidth,scale=0.6,angle=0]{../project/physical_design/images_nopt/DLX_IR_SIZE32_PC_SIZE32_nopt_place_prerouting.jpg}
\caption{Unconstrained design}
\label{fig:placno}
\end{figure}




\begin{figure}[!htbp]
\centering
\captionsetup{justification=centering}
\includegraphics[scale=0.5,angle=0]{../project/physical_design/images_10/DLX_IR_SIZE32_PC_SIZE32_10_place_prerouting.jpg}
\caption{Placement of design with 10\% less on clock frequency}
\end{subfigure}
\begin{subfigure}[b]{0.4\linewidth}
\includegraphics[width=\linewidth,scale=0.6,angle=0]{../project/physical_design/images_10/DLX_IR_SIZE32_PC_SIZE32_10_place_prerouting.jpg}
\caption{10\% more on clock frequency}
\label{fig:plac10}
\end{figure}






\begin{figure}[!htbp]
\centering
\captionsetup{justification=centering}
\includegraphics[scale=0.5,angle=0]{../project/physical_design/images_1_minarea/DLX_IR_SIZE32_PC_SIZE32_1_minarea_place_prerouting.jpg}
\caption{Placement of design with 1\% less on clock frequency and minumum area}
\end{subfigure}

\begin{subfigure}[b]{0.4\linewidth}
\includegraphics[width=1\linewidth,scale=0.6,angle=0]{../project/physical_design/images_1_minarea/DLX_IR_SIZE32_PC_SIZE32_1_minarea_place_prerouting.jpg}
\caption{1\% more on clock frequency and minumum area}
\label{fig:plac1minarea}
\end{subfigure}
\caption{Placement view}
\label{fig:place}
\end{figure}
47 changes: 38 additions & 9 deletions report/chapters/synthesis.tex
Original file line number Diff line number Diff line change
@@ -1,29 +1,58 @@
\chapter{Synthesis}
\label{Synthesis}
The synthesis of the design has been achieved by a script (Appendix \ref{appendix11}) which contains also the proposed synthesis algorithms. In addition, the script is also in charge of adjusting the environment variables and folders needed for the synthesis and later for the physical design script.\\\\
The synthesis has been done through an inductive approach. As first step, a simple design without any constraints has been synthesized and evaluated. For moving inside the design space, the next step has been using as constrained on the clock different percentage values of the non-constrained synthesized design clock. Different percentage has been used, from 1\% up to 20\% (reduciton of clock frequency wrt to the non-constrained synthesized clock). Moreover, in this case as synthesis strategy \textit{compile\_ultra} has been used for pushing more effort in general optimizaitons. The estimation of the area as much as possible to a real microprossessor has been achieved by the usage of Scan Flip Flops instead of the normal Flip Flops (avialable in the used library).\\
The synthesis has been done through an inductive approach. As first step, a simple design without any constraints has been synthesized and evaluated. For moving inside the design space, the next step has been using as constrained on the clock different percentage values of the non-constrained synthesized design clock. Different percentage has been used, from 1\% up to 20\% (increase of clock frequency wrt to the non-constrained synthesized clock). Moreover, in this case as synthesis strategy \textit{compile\_ultra} has been used for pushing more effort in general optimizaitons. The estimation of the area as much as possible to a real microprossessor has been achieved by the usage of Scan Flip Flops instead of the normal Flip Flops (avialable in the used library).\\
As next degree of freedom, in the design space, all the previous designs with different clock constraints have been synthesized putting a minimum area constraint.

\section{Results}
The results in terms of area, latency and area are collected and presented as graphs.\\
As first design space, the latency-area graph can be seen in Figure \ref{fig:lat_area}:
add a latency area graph
\begin{figure}[!htbp]
\centering
\captionsetup{justification=centering}
%\includegraphics[scale=0.35,angle=0]{./figure/graphs/utilization_factor_30mhz_int16.pdf}
\caption{Design space: Area vs Latency}
\includegraphics[scale=0.6,angle=0]{./chapters/files/latency_area.pdf}
\caption{Design space: Area \protect\footnotemark[1] vs Latency}
\label{fig:lat_area}
\end{figure}

where .....\\\\
\footnotetext[1]{Cell area}
In this design space, the best design according to latency and area is the one synthesized with a clock frequency greater than the 20\% of the non constrained design frequency and without no constraints on the area.

Moreover, a further exention of the previous design space may be the power consumption (where also constrains can be added). In Figure \ref{fig:area_power_latency} it is presented the power consumption of the constraints on area and latency.

\begin{figure}[!htbp]
\centering
\captionsetup{justification=centering}
%\includegraphics[scale=0.35,angle=0]{./figure/graphs/utilization_factor_30mhz_int16.pdf}
\includegraphics[scale=0.25,angle=0]{./chapters/files/latency_area_power.png}
\caption{Design space: Area vs Latency vs Power}
\label{fig:area_power_latency}
\end{figure}
\end{figure}\\
An as best design, it is still the one synthesized with 20\% more of frequency. This result is probably due to the synthesis strategies and choices which lead to the best results of this design in terms of power,area and latency with the only constraint on the latency.\\\\


In the following table the results from synthesis reports are summarazied:\\

\begin{table}[h!]
\centering
\begin{tabular}{ |p{3cm}||p{3cm}|p{3cm}|p{3cm}| }
\hline
Design & Area [\textmu $m^2$] & Latency [ns] & Power [\textmu W]\\
\hline
No optimization &30034,85832 &24,96& 3,40E+07\\
\hline
+ 20\% frequency & 17896,21415 &16,14 &1,26E+07\\
\hline
+ 10\% frequency &25306,70806& 18,15& 1,87E+07\\
\hline
+ 1\% frequency & 24199,08403 & 19,97 &1,69E+07\\
\hline
+ 20\% frequency and minarea & 24649,95403 &16,14& 1,98E+07\\
\hline
+ 10\% frequency and minarea&23870,84002 & 18,15& 1,78E+07\\
\hline
+ 1\% frequency and minarea& 24569,62203 & 19,97 & 1,69E+07\\
\hline

\end{tabular}
\caption{Area-Latency-Power points in Design space}
\label{table:1}
\end{table}

Loading

0 comments on commit a1f681a

Please sign in to comment.