Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix (optimum): Fix gptq and ONNX export #810

Merged
merged 2 commits into from
Jan 30, 2024

Conversation

Giuseppe5
Copy link
Collaborator

@Giuseppe5 Giuseppe5 commented Jan 25, 2024

The user can now specify the suffix of the layer names that can be run in parallel in GPTQ, to speed-up that computation.

This PR also fixed the export of ONNX files >2gb. The idea is that exporting the weight as integer causes the appearances of multiple parameters (the original FP weights, and the integer ones) in the model which are not handled correctly by pytorch during export.

This PR restores the possibility of exporting the FP weights + QDQ (instead of integer weight + DQ). In this way, having to handle only the original parameters of the model, everything seems to work fine.

@Giuseppe5 Giuseppe5 changed the title Export Fix Fix (optimum): Fix 2GB ONNX export error Jan 25, 2024
@Giuseppe5 Giuseppe5 force-pushed the optimum_improv branch 2 times, most recently from 49bfd35 to 28ffb88 Compare January 30, 2024 12:13
@Giuseppe5 Giuseppe5 changed the title Fix (optimum): Fix 2GB ONNX export error Fix (optimum): Fix gptq and ONNX export Jan 30, 2024
@Giuseppe5 Giuseppe5 merged commit 883e193 into Xilinx:optimum Jan 30, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant