You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Instead of using the vit_base_patch16_224.quant.onnx file, I used the original ONNX file (vit_base_patch16_224.onnx which can be downloaded by download_example_onnx.py script file) to create an engine and evaluate the accuracy.
For FP32, I got the following accuracy:
The top1 accuracy of the model is 58.16%
For FP16, I got the following accuracy:
The top1 accuracy of the model is 58.29%
Both FP16 and FP32 precision of the ViT network show 30% less accuracy compared to the quantized network (84.51%).
Why does this issue happen?
The text was updated successfully, but these errors were encountered:
chjej202
changed the title
30% Increase in Accuracy After INT8 Quantization for the ViT Example in ONNX_PTQ
FP16 and FP32 shows 30% lower accuracy compared to INT8 for the ViT Example in ONNX_PTQ
Nov 14, 2024
I followed the directions in the README.md file in the onnx_ptq directory.
I successfully obtained the vit_base_patch16_224.quant.onnx file and got the following evaluation accuracy:
The top-1 accuracy of the model is 84.51%.
To compare the quantization result with FP16 or FP32 of the same network, I ran the following commands to get the accuracy:
For FP32,
For FP16
Instead of using the vit_base_patch16_224.quant.onnx file, I used the original ONNX file (vit_base_patch16_224.onnx which can be downloaded by download_example_onnx.py script file) to create an engine and evaluate the accuracy.
For FP32, I got the following accuracy:
The top1 accuracy of the model is 58.16%
For FP16, I got the following accuracy:
The top1 accuracy of the model is 58.29%
Both FP16 and FP32 precision of the ViT network show 30% less accuracy compared to the quantized network (84.51%).
Why does this issue happen?
The text was updated successfully, but these errors were encountered: