-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GPTQ for inputs with batch size != 1 and with seq len == 1 #3002
Fix GPTQ for inputs with batch size != 1 and with seq len == 1 #3002
Conversation
91773fd
to
9add286
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
9add286
to
1214ce8
Compare
This PR resolves 155538 |
@@ -264,7 +264,7 @@ def _quantize_weights( | |||
scales.append(scale) | |||
else: | |||
if self._scale_estimation and block_compression_config.num_bits == 4: | |||
activations = [inp.squeeze()[:, (i1 + i) : (i1 + i + group_size)] for inp in inputs] | |||
activations = [inp[..., (i1 + i) : (i1 + i + group_size)] for inp in inputs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slice by last dimension, which is supposed to be hidden one.
It's aligned with processing statistics in activations_to_wc_statistics
when reduction axes are all dimensions except the last one @nikita-savelyevv
[ | ||
LMLinearModel.INPUT_SHAPE, | ||
[3, 5, 16], | ||
[1, 1, 16], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test case for the bug 155538 with tiny-llama example. The root cause is not filtered data with sequence length == 1.
@kshpv
"input_shape", | ||
[ | ||
LMLinearModel.INPUT_SHAPE, | ||
[3, 5, 16], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test case to cover SD case with batch size != 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally with different input shapes. It works.
Changes
GPTQ correctly processes inputs with batch size != 1 and with batch size and sequence length equal 1.
Also changed the errors we are raising in NNCF from built-in Python errors to NNCF-specific ones.
Reason for changes
Stable-diffusion models, e.g.
runwayml/stable-diffusion-v1-5
has as an input for linear layers with the following shapes:[2*num_images_in_prompt, text_embedding_size, hidden_dimension]
.https://github.com/openvinotoolkit/nncf/blob/develop/examples/llm_compression/openvino/tiny_llama/main.py
uses not filtered data from
wikitext
that leads to the corner case with sequence length == 1.Related tickets
150851, 155538
Tests
CI