Replies: 5 comments 6 replies
-
Hey,
|
Beta Was this translation helpful? Give feedback.
-
The reason why you see overflow is because as mentioned earlier, If you modify the forward pass of the model to: def forward(self, obs):
obs = quantize_input_tensor(obs, self.scale, training=self.training, device=self.device)
net_out = self.fc1(obs)
print(net_out.bit_width)
net_out = self.relu1(net_out)
net_out = self.fc2(net_out)
print(net_out.bit_width)
net_out = self.relu2(net_out)
net_out = self.fc3(net_out)
print(net_out.bit_width)
return net_out The output will be: tensor(26.)
tensor(41.)
tensor(57.) In Brevitas, if the input/weights/bias of a QuantConvolutions are all quantized (and respect certain constraints) we compute the worst case scenario bit-width required to store the output of the operation. Since there is never any re-quantization step to bring down the bitwidth after the convolution, it keeps growing up and then overflowing. When dealing with FPGAs implementation, we try to avoid this behaviour as it reduces the advantages of quantization. Let me know if this is a desired behaviour, before we continue with the other issues.
|
Beta Was this translation helpful? Give feedback.
-
I think the better alternative would be to have your FPGA implementation to requantize the values to a lower bit-width to avoid overflow issues. What I can say is that usually FPGA implementations rely on low precision bit-width computations, with more or less clever tricks to compensate for the overhead added by the requantization steps (I think I already mentioned FINN, that uses some of these techniques). In general, my experience with these tasks is to approach the problem differently, designing first a quantized network with good accuracy (keeping in mind a few good-to-have FPGA constraints, e.g., the requantization step to avoid overflow as I mentioned earlier), and then implement that design on FPGA, and having it match your floating point version. Usually some back and forth is necessary to mantain accuracy and optimize your FPGA implementation. |
Beta Was this translation helpful? Give feedback.
-
I tried the code in the above snippet #995 (reply in thread) using
However I still got the differences
|
Beta Was this translation helpful? Give feedback.
-
The output of the network will have a larger bit-width as it is the output of a linear layer. Matrix multiplication between two int8 values requires a higher precision accumulation. The bit-width that you see represents a worst case estimation, and if you need tighter bounds, Brevitas supports accumulation aware quantization: With respect to the difference, I'm not quite sure if the two models you're trying to compare are actually equivalent; |
Beta Was this translation helpful? Give feedback.
-
Hello Brevitas team,
I have been working on a project involving quantized neural network models using Brevitas, and I have encountered some discrepancies between the model outputs and my simulated software implementation. I modified my approach to achieve more consistent results, and I would like to verify if I am using the library correctly. Below, I detail the original code, the modifications I made, and my specific questions.
Original Code
Here is the original code snippet I was using:
Here is the original implementation of the
Digital_twin
class:In this original code, the
out_brevitas_int
andY_pred_64
do not match, leading to discrepancies in the error calculations.Modified Code
To address these discrepancies, I made the following modifications:
In the modified code, the
quantize_input_tensor
function is used to quantize the input tensor before passing it to the model.Questions
Library Usage:
QuantLinear
andQuantReLU
layers correctly in conjunction with theQuantTensor
for input quantization?Input Tensor Quantization:
quantize_input_tensor
function correctly implemented to quantize input tensors before passing them to the Brevitas model?quantize_input_tensor
? how can I properly tune scale?Handling Data Types:
.int()
isint64
to avoid overflow issues. Is this an appropriate modification, or is there a better way to handle potential overflow in quantized models?Why do I not get the same results from Brevitas and the software version when using
QuantIdentity
, while I do when usingquantize_input_tensor
?Thank you for your time and assistance.
Best regards,
Giulio.
Beta Was this translation helpful? Give feedback.
All reactions