Fix (optimum): Fix gptq and ONNX export #810

Giuseppe5 · 2024-01-25T19:43:10Z

The user can now specify the suffix of the layer names that can be run in parallel in GPTQ, to speed-up that computation.

This PR also fixed the export of ONNX files >2gb. The idea is that exporting the weight as integer causes the appearances of multiple parameters (the original FP weights, and the integer ones) in the model which are not handled correctly by pytorch during export.

This PR restores the possibility of exporting the FP weights + QDQ (instead of integer weight + DQ). In this way, having to handle only the original parameters of the model, everything seems to work fine.

Giuseppe5 force-pushed the optimum_improv branch from 4b6b2d6 to 02e701a Compare January 25, 2024 19:45

Giuseppe5 changed the title ~~Export Fix~~ Fix (optimum): Fix 2GB ONNX export error Jan 25, 2024

Giuseppe5 force-pushed the optimum_improv branch from 95f97b7 to 5b340cc Compare January 30, 2024 09:43

Giuseppe5 force-pushed the optimum branch from 9b26ace to 0677fb2 Compare January 30, 2024 11:08

Giuseppe5 force-pushed the optimum_improv branch 2 times, most recently from 49bfd35 to 28ffb88 Compare January 30, 2024 12:13

Giuseppe5 added 2 commits January 30, 2024 12:20

Fix 2GB ONNX export error

f5005d0

Fix gptq + speedup

278d30d

Giuseppe5 force-pushed the optimum_improv branch from 28ffb88 to 278d30d Compare January 30, 2024 12:21

Giuseppe5 changed the title ~~Fix (optimum): Fix 2GB ONNX export error~~ Fix (optimum): Fix gptq and ONNX export Jan 30, 2024

Giuseppe5 merged commit 883e193 into Xilinx:optimum Jan 30, 2024
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix (optimum): Fix gptq and ONNX export #810

Fix (optimum): Fix gptq and ONNX export #810

Giuseppe5 commented Jan 25, 2024 •

edited

Loading

Fix (optimum): Fix gptq and ONNX export #810

Fix (optimum): Fix gptq and ONNX export #810

Conversation

Giuseppe5 commented Jan 25, 2024 • edited Loading

Giuseppe5 commented Jan 25, 2024 •

edited

Loading