[AWQ] Cast `fns.quantile()` result to float32 #3044

nikita-savelyevv · 2024-10-28T14:06:28Z

Changes

Cast fns.quantile() result to float32 inside AWQ algorithm.

Reason for changes

fns.quantile() for numpy backend returns np.float64 value. In AWQ it is used as a clip lower bound, resulting in float64 result. Then via chain reaction it leads to weights and activations being converted to float64.

As I understand, processing in float64 is not necessary. At the same time it leads to increased running time. Below are measurements for compression time with AWQ enabled before and after the changes.

Model	develop (sec.)	branch (sec.)
tiny-llama-1.1b	123	109 (-11%)
phi3_mini-3.7b	487	419 (-14%)
llama3-8b	1091	912 (-16%)

nikita-savelyevv · 2024-10-29T05:42:46Z

Post-training weight compression conformance test shows no accuracy degradations (NNCF/job/manual/job/post_training_weight_compression/229). I've compared against develop build 230.

Cast quantile result to float32

0ccd926

github-actions bot added the NNCF PTQ Pull requests that updates NNCF PTQ label Oct 28, 2024

nikita-savelyevv marked this pull request as ready for review October 29, 2024 05:42

nikita-savelyevv requested a review from a team as a code owner October 29, 2024 05:42

nikita-savelyevv requested review from andreyanufr and ljaljushkin October 29, 2024 05:42

andreyanufr approved these changes Oct 29, 2024

View reviewed changes

ljaljushkin approved these changes Oct 29, 2024

View reviewed changes

ljaljushkin merged commit db3a935 into openvinotoolkit:develop Oct 30, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AWQ] Cast `fns.quantile()` result to float32 #3044

[AWQ] Cast `fns.quantile()` result to float32 #3044

nikita-savelyevv commented Oct 28, 2024 •

edited

Loading

nikita-savelyevv commented Oct 29, 2024 •

edited

Loading

[AWQ] Cast fns.quantile() result to float32 #3044

[AWQ] Cast fns.quantile() result to float32 #3044

Conversation

nikita-savelyevv commented Oct 28, 2024 • edited Loading

Changes

Reason for changes

nikita-savelyevv commented Oct 29, 2024 • edited Loading

[AWQ] Cast `fns.quantile()` result to float32 #3044

[AWQ] Cast `fns.quantile()` result to float32 #3044

nikita-savelyevv commented Oct 28, 2024 •

edited

Loading

nikita-savelyevv commented Oct 29, 2024 •

edited

Loading