Skip to content

nrgbistro/ece350-32bit-cpu

Repository files navigation

ALU

Nolan Gelinas

Description of Design

  • ALU
    • Adder
      • I combined 8 1-bit full adders together and modified it to output P and G instead of carry out
      • I also created a CLA module which takes in an 8-bit bus of P's and G's and outputs 8-bit C values as well as GG and PG
      • I combined 4 of these 8-bit carry lookahead adders and connected them using the calculated GG and PG for each set of 8 bits
      • I also created a 2's compliment module to assist with subtraction
        • There were some edge cases for subtraction with the overflow bit where I can to calculate when the two's compliment would overflow. I used a 1-bit mux to get around this
    • and/or
      • This module is where I learned how to use genvar
      • I used a for loop that iterates 32 times which creates 32 and/or gates
    • Shifter
      • Bit shifting was relatively easy, I simply took the top or bottom n bits and put them on a wire, then set the reminaing bits on that wire accordingly
      • Each shifter has a mux built in that can be activated by a 1 bit selector
      • I chained each of my barrel shifters together directly and split up the 5-bit shift amount to each shifter
    • Comparator
      • I used the comparator we created in lab and modified it to less than instead of greater than
      • I left equals as is and simply notted the value before I passed it to the alu output
  • Multiplier
    • Counter
      • I implemented the counter like we did in class using t flip flops. I expected to only need a 4 bit counter for modified booth's algorithm but I ended up needing a 5 bit counter because I needed to count up to 16 cycles to generate the correct answer.
    • Special modules
      • I needed to make a 65 bit register to hold my product.
      • I also needed to make a 4 bit mux that takes in 65 bits of data. In order to create this module I also needed a 2 bit mux that takes in 65 bits of data.
      • I detect resets when mult or div is enabled. This works for the current checkpoint but may need to be modified to implement division
      • The special case checker module was also needed to ensure the overflow bit was correct when the operands were -32768 and 65536. This is the only case of incorrect overflow I could find.

Bypassing

  • I was able to get bypassing working for most instructions where the value being read from memory was being or going to be updated in the memory or writeback stages.
  • The most difficult part of the bypassing logic was checking the correct parts of the intruction for its instruction type. I used a 2 bit mux to select the correct part of the instruction to check for the instruction type, which changed the values being compared to rd for i type instructions.
  • I used a 2 bit wire to select where to bypass from. 1x means do not bypass, 00 means bypass from memory, 01 means bypass from writeback.
    • These values are calculated inside my Bypass module and used in the processor module.
  • dmem bypassing was very simple to implement, I only had to check if the memory RD was the same as the writeback RD.

Stalling

  • There were many bugs that I ran into while debugging sort.s that I could not figure out how to bypass. I ended up having to implement stalling for the following instruction sequences:
    • Load:
      • lw $A, 0($B)
      • add $A, $A, $C
    • Jump Return:
      • lw $A, 0($B)
      • jr $A
    • Save:
      • sw $A, 0($B)
      • sw $C, 0($A)
  • In order to stall, I had a wire for each stage in the pipeline that would be set to 1 if the stage was stalled. These wires were notted inside the module and disabled the write for that set of registers.
  • I also needed to stall the entire processor whenever a mult or div instruction got to the execute stage. I did not implement a separate pipeline stage for multdiv, I instead used the inputs for the ALU and selected the output from multdiv when it was complete to send to the next stage.

Optimizations

  • I optimized my hardware cost by using as few additional adders as possible. I had one in my ALU, one for PC + 1, and one to calculate branch addresses.
  • Further optimizations could likely be made by reducing the amount of stalling that occurs, but I was not able to get this working in time for the deadline.
  • Another optimization I thought of would be allowing non-conflicting instructions to pass through a multdiv operation. This would require a separate pipeline latch and would likely be more difficult to implement due to the complexity of detecting conflicts during a multi-cycle operation.

Bugs

  • I had an issue with the overflow of the multiplier when the operands were -32768 and 65536. The overflow should have been 0 but 1 was outputted. I could not figure out why this was because result is correct but in order to fix this issue I have a special module that will detect when these inputs are passed to ensure that the overflow bit stays 0.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published