A top down view of the architecture

A Top down view of the architecture

Every Digital System is always divided in two big blocks, as it can be also observed in Figure:

Control Unit, it is the brain of the processor and it is in charge of handling the synchronization between stages asserting the proper signals.
Datapath, it is the actual brawn of the processor. It is composed by 5 functional units (meaning that it is a 5 stage pipeline processor) that perform data processing operations on data.

DLX top level entity

Notice that the clock and reset signal are routed to every unit (interconnections missing in Figure for increasing readability). Moreover, the debug signals are present only for simulation purposes, they are removed during the synthesis by means of synthesis pragma.

Control Unit

The brain of the processor is a simple control unit based on two states, fetch and decode. However, the fetch state is executed only at the reset, leading to have a nop operation in the pipeline operation. Meanwhile, when the processor is fully operational, it is always in decode state.

The control unit is based on the hardwired approach, meaning that for each one of the instruction there is a predefined signature for signals to be asserted (except for particular case, such as the sub instruction, where the carry in must be set to 1 since the adder is shared among the addition and subtraction). All the predefined signals for given instruction are activated only once. Therefore, for correctly synchronizing the pipeline they have to be properly delayed by mean of registers.
The control unit is also in charge of selecting the proper operation for the ALU. During the execution of the integer multiplication, it stalls the pipeline for the whole duration of the operation in order to avoid hazards or it may restore the pipeline behavior in the case that one of the multiplication operands is zero and/or it is bigger than 2¹⁶ − 1, since the multiplication is between 16-bitwidth operands.

Datapath

The datapath of the DLX is composed by 5 stages, as in Figure:

Datapath

Fetch Stage: it uses as address for instruction memory the value of the PC, while the data(instruction) coming from the memory are saved into the IR. It also computes the new value of the PC, a plus 4 during normal operation, and a loop back of the PC value in the case the pipeline is in stall.
Decode Stage: it decodes the instruction and depending on the instruction type, it selects the correct values for accessing to the register file and save them in the A and B registers or it extends the value of the immediate field (from 16 bit to 32 bit in case of an immediate instruction or from 26 to 32 bit in case of jump instruction).
Execute Stage: The ALU operates on its input, depending on the current operation, and in this stage the condition for taking the branch is eventually evaluated. It is based of an enhanced version of the ALU developed during the laboratories, in which has been added the missing operations of comparison. In addition, the current adder has also been developed during the lab and it is based on the Pentium 4 adder which is shared among the addition and subtraction exploiting the properties of two’s complement binary representation.
Memory Stage: it is in charge of accessing the memory if needed or load the data from memory in LMD register. It also decides the value of the PC in the case a branch is taken.
Write Back Stage: it writes back into the register file either data from ALU or data memory.

Moreover, the green registers between each stage (i.e. IF/ID, ID/IE, IE/IM and IM/IWB) are registers used for synchronization purposes of values[1], for example the register index for writing back into the register file need to be delayed by 3 clock cycles (it is needed in the write back stage). Nevertheless, the logical view of Figure, they are actually implemented in the right stage (if not shown otherwise, such as for the A and B registers).

ALU Multiplier

Another important aspect of the ALU is the integer multiplier.
It is based on the Booth’s multiplier developed during the lab as in Figure.

Booth’s multiplier

Pipelined Booth’s multiplier

As it can be seen from Figure, the previously developed unit has been enhanced pipelining it in order to have an 8 stage (6 inside the multiplier and 2 outside the unit, the A and B register and the ALU output register) multiplier, reducing the critical path and increasing the possible achievable performances. It is worth to mention that since the unit is pipeline it can potentially execute in parallel up to 8 multiplication, one after the other without any data hazards and with a proper control unit to handle the specific situation.
Moreover, the compiler script has been also modified for adding the integer multiplication with the proper opcode

[1] The Figure represents a 8-bit multiplier. However, the approach is the same, only the internal stages of the multiplier have been pipelined.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A top down view of the architecture

A Top down view of the architecture

Control Unit

Datapath

ALU Multiplier

Clone this wiki locally