Fix GPTQ for inputs with batch size != 1 and with seq len == 1 #3002

ljaljushkin · 2024-10-01T18:44:41Z

Changes

GPTQ correctly processes inputs with batch size != 1 and with batch size and sequence length equal 1.
Also changed the errors we are raising in NNCF from built-in Python errors to NNCF-specific ones.

Reason for changes

Stable-diffusion models, e.g. runwayml/stable-diffusion-v1-5 has as an input for linear layers with the following shapes: [2*num_images_in_prompt, text_embedding_size, hidden_dimension].

https://github.com/openvinotoolkit/nncf/blob/develop/examples/llm_compression/openvino/tiny_llama/main.py
uses not filtered data from wikitext that leads to the corner case with sequence length == 1.

Related tickets

150851, 155538

Tests

test_compression_with_transposed_activations
test_compression_with_different_algo_combinations
test_raise_error_with_unsupported_params_for_e2m1
test_raise_error_with_unsupported_params_for_empty_dataset

CI

weight compression conformance

alexsu52

LGTM

nncf/quantization/algorithms/weight_compression/algorithm.py

kshpv · 2024-10-22T08:50:11Z

This PR resolves 155538

ljaljushkin · 2024-10-23T09:30:54Z

nncf/quantization/algorithms/weight_compression/gptq.py

@@ -264,7 +264,7 @@ def _quantize_weights(
                        scales.append(scale)
                    else:
                        if self._scale_estimation and block_compression_config.num_bits == 4:
-                            activations = [inp.squeeze()[:, (i1 + i) : (i1 + i + group_size)] for inp in inputs]
+                            activations = [inp[..., (i1 + i) : (i1 + i + group_size)] for inp in inputs]


Slice by last dimension, which is supposed to be hidden one.
It's aligned with processing statistics in activations_to_wc_statistics when reduction axes are all dimensions except the last one @nikita-savelyevv

ljaljushkin · 2024-10-23T09:34:05Z

tests/openvino/native/quantization/test_weights_compression.py

+    [
+        LMLinearModel.INPUT_SHAPE,
+        [3, 5, 16],
+        [1, 1, 16],


Added test case for the bug 155538 with tiny-llama example. The root cause is not filtered data with sequence length == 1.
@kshpv

ljaljushkin · 2024-10-23T09:34:27Z

tests/openvino/native/quantization/test_weights_compression.py

+    "input_shape",
+    [
+        LMLinearModel.INPUT_SHAPE,
+        [3, 5, 16],


test case to cover SD case with batch size != 1

…weight_compression

andreyanufr

Tested locally with different input shapes. It works.

ljaljushkin requested a review from a team as a code owner October 1, 2024 18:44

github-actions bot added NNCF PT Pull requests that updates NNCF PyTorch NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ labels Oct 1, 2024

ljaljushkin requested review from alexsu52 and andreyanufr October 1, 2024 18:45

ljaljushkin force-pushed the nl/3d_activations_weight_compression branch from 91773fd to 9add286 Compare October 1, 2024 20:02

alexsu52 approved these changes Oct 7, 2024

View reviewed changes

nncf/quantization/algorithms/weight_compression/algorithm.py Outdated Show resolved Hide resolved

ljaljushkin changed the title ~~Support for 3D activations in data-aware weight compression~~ Fix for inputs with batch size != 1 in data-aware weight compression Oct 8, 2024

ljaljushkin force-pushed the nl/3d_activations_weight_compression branch from 9add286 to 1214ce8 Compare October 21, 2024 13:32

ljaljushkin added 2 commits October 21, 2024 15:47

Fix for inputs with batch size != 1 in data-aware weight compression

1214ce8

fixed errors with backup_mode

2dc74db

andreyanufr approved these changes Oct 22, 2024

View reviewed changes

ljaljushkin changed the title ~~Fix for inputs with batch size != 1 in data-aware weight compression~~ Fix GPTQ for inputs with batch size != 1 and with seq len == 1 Oct 23, 2024

ljaljushkin requested review from alexsu52, andreyanufr and nikita-savelyevv October 23, 2024 09:28

ljaljushkin commented Oct 23, 2024

View reviewed changes

ljaljushkin added 2 commits October 23, 2024 11:37

supported corner case with batch and seq len =1

dba63e8

Merge remote-tracking branch 'origin/develop' into nl/3d_activations_…

0465015

…weight_compression

nikita-savelyevv approved these changes Oct 23, 2024

View reviewed changes

alexsu52 approved these changes Oct 24, 2024

View reviewed changes

andreyanufr approved these changes Oct 24, 2024

View reviewed changes

ljaljushkin merged commit 57e3891 into openvinotoolkit:develop Oct 24, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GPTQ for inputs with batch size != 1 and with seq len == 1 #3002

Fix GPTQ for inputs with batch size != 1 and with seq len == 1 #3002

ljaljushkin commented Oct 1, 2024 •

edited

Loading

alexsu52 left a comment

kshpv commented Oct 22, 2024

ljaljushkin Oct 23, 2024 •

edited

Loading

ljaljushkin Oct 23, 2024

ljaljushkin Oct 23, 2024

andreyanufr left a comment

Fix GPTQ for inputs with batch size != 1 and with seq len == 1 #3002

Fix GPTQ for inputs with batch size != 1 and with seq len == 1 #3002

Conversation

ljaljushkin commented Oct 1, 2024 • edited Loading

Changes

Reason for changes

Related tickets

Tests

alexsu52 left a comment

Choose a reason for hiding this comment

kshpv commented Oct 22, 2024

ljaljushkin Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

ljaljushkin Oct 23, 2024

Choose a reason for hiding this comment

ljaljushkin Oct 23, 2024

Choose a reason for hiding this comment

andreyanufr left a comment

Choose a reason for hiding this comment

ljaljushkin commented Oct 1, 2024 •

edited

Loading

ljaljushkin Oct 23, 2024 •

edited

Loading