Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repacker greatly alters one LUT in 2xLUT5 mode of an FLE in some cases when other LUT5 is unused yet configured as a wire-LUT in the .net #1672

Open
rs-dhow opened this issue May 16, 2024 · 0 comments

Comments

@rs-dhow
Copy link

rs-dhow commented May 16, 2024

Assume an architecture with CLBs. Each has multiple FLEs. Each FLE has a 2xLUT5 mode, i.e., 2 LUT5s share 5 inputs and have separate outputs. There are other modes not relevant here.

The VPR packer writes out a .net file with several FLEs in the 2xLUT5 mode. Let's consider an FLE where (i) one LUT5 is configured in the .net as a wire-LUT but seemingly unused and (ii) the other LUT5 in the same FLE is configured as a function and its output does go somewhere.

In this case, OpenFPGA's repacker may do both of these:

  1. Change which input the wire-LUT follows.
  2. "Alter" the function in the other used LUT in the same FLE.

(1) may be done in order to enable the change in (2), which seems to always be an input duplication. Perhaps the duplicated input comes through feedback from the unused LUT5. (?)

We will call the 2 LUT5s the wire-LUT and the function LUT.

What does "alter" mean? Say the function LUT5 implements a function which appears as a LUT3 in the netlist. Let's say this function LUT_MASK has 3 zeroes and 5 ones. When we expand this from a LUT3 to a LUT5, then regardless of the port_rotation_map the zeroes should increase 4x from 3 to 12 and the ones should increase 4x from 5 to 20. (4x comes from 2^(5-3) where 5 comes from LUT5 and 3 comes from LUT3.) In other words, both counts increase by the same multiplier, which is a function of the difference in LUT size in the netlist versus in the hardware.

However, the repacker doesn't increase the ones and zeroes equally. In some cases I have looked at, it appears to be assuming that two inputs in the LUT3-->LUT5 expanded result are newly connected together on the function LUT5 (not the wire-LUT5). This means there are bit positions in the function LUT5 that are never addressable, and they then seem to be set to 0. This means the number of zeroes increases more than the number of ones.

More importantly, this may create a new logic hazard. If the external signal has different delays to the duplicated inputs, let's consider a case where e.g. 111 --> 011 on the in[2:0] inputs (in .net) and the function is 1 in both cases. However, if the repacker newly duplicates in[2] onto in[3], the hardware sees this as 1111 --> 0011 on in[3:0]. With different delays to in[2] and in[3], this transition may go through 0111 or 1011 temporarily. These are supposedly unaddressable bits which are zero in the LUT. This introduces a 0 pulse on the output whose duration is equal to the difference in input delays.

We have proven that each observed instance of the repacker behavior we reported in #1670 and #1671 does not change functionality (but may change timing). This case is much more complicated, and we cannot tell if functionality is preserved.

Most importantly, however, we see no reason for the repacker to be making these changes. The preferable solutions are:

  1. Always follow the .net file faithfully.
  2. Have an option that follows .net faithfully.

Due to the possibility of a new logic hazard, we do not think what the repacker is now doing is acceptable, and writing out a new .net file with the repacker's changes would not be adequate. Please pursue (1) or (2).

Thanks for your attention.

Addendum.

Adding the following assertion may be helpful in finding this problem.

// lut_mask comes from the netlist
int mask_zeroes = count_bits(lut_mask, 0);
int mask_ones = count_bits(lut_mask, 1);
// lut5_mask comes from the result repacked to the physical hardware
int lut5_zeroes = count_bits(lut5_mask, 0);
int lut5_ones = count_bits(lut5_mask, 1);
// check for the intended result with added inputs being don't-cares.
int growth = 1 << (lut5_mask_inputs - lut_mask_inputs);
assert(mask_zeroes * growth == lut5_zeroes);
assert(mask_ones * growth == lut5_ones);

This assumes a function count_bits(lutmaskbits, bitvalue) which does the obvious thing.

To Reproduce
Steps to reproduce the behavior:

  1. Clone OpenFPGA repository and checkout commit id:
  2. Execute OpenFPGA task or your own example:
  3. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Enviornment (please complete the following information):

  • OS:
  • Compiler:
  • Version:

Additional context
Add any other context about the problem here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant