remove to restriction for 4-bit model #33122

SunMarc · 2024-08-26T13:31:59Z

What does this PR do ?

Since bnb 0.43.0, you freely move bnb models across devices. This PR removes the restriction we put in place.
Needs to be tested. cc @matthewdouglas

HuggingFaceDocBuilderDev · 2024-08-26T13:51:24Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

matthewdouglas · 2024-08-26T15:16:26Z

Thanks @SunMarc! I've tested moving between gpu->cpu->gpu, but not yet on multiple GPUs. We'll still see a warning from accelerate:

You shouldn't move a model that is dispatched using accelerate hooks.

src/transformers/modeling_utils.py

matthewdouglas · 2024-08-26T16:32:26Z

Reference note: this should fix #24540 for 4bit.

For 8bit there is still a blocker: bitsandbytes-foundation/bitsandbytes#1332; once that's fixed & released on the bitsandbytes side we can do an additional PR.

Co-authored-by: Matthew Douglas <[email protected]>

…th .to or .cuda

matthewdouglas · 2024-08-27T18:50:20Z

src/transformers/modeling_utils.py

        if getattr(self, "quantization_method", None) == QuantizationMethod.BITS_AND_BYTES:
            if getattr(self, "is_loaded_in_4bit", False):
-                if version.parse(importlib.metadata.version("bitsandbytes")) < version.parse("0.43.0"):
+                if version.parse(importlib.metadata.version("bitsandbytes")) < version.parse("0.43.2"):


@SunMarc I've bumped this to 0.43.2 since that's when bitsandbytes-foundation/bitsandbytes#1279 was landed.

Nice, thanks for updating the PR !

LysandreJik

Thanks for the PR! This looks good

LysandreJik · 2024-08-30T12:21:24Z

src/transformers/modeling_utils.py

+                raise ValueError(
+                    "Calling `cuda()` is not supported for `4-bit` quantized models. Please use the model as it is, since the"
+                    " model has already been set to the correct devices and casted to the correct `dtype`. "
+                    "However, if you still want to move the model, you need to install bitsandbytes >= 0.43.2 "
+                )


The warning isn't super clear to me in terms of what the user should or should not do; should they install the new version or should they just let the model there? I'd try to clarify this a bit

Good feedback, thanks! Updated. I think in most cases the user would be using .cuda() without realizing it is already on a GPU so I put the current model.device in the message. That should help inform on whether they really meant to move it somewhere else and need to upgrade.

* remove to restiction for 4-bit model * Update src/transformers/modeling_utils.py Co-authored-by: Matthew Douglas <[email protected]> * bitsandbytes: prevent dtype casting while allowing device movement with .to or .cuda * quality fix * Improve warning message for .to() and .cuda() on bnb quantized models --------- Co-authored-by: Matthew Douglas <[email protected]>

remove to restiction for 4-bit model

08f9c93

SunMarc mentioned this pull request Aug 26, 2024

Using multi GPU fails with AutoModelForCausalLM quantization_config=quantization_config #33112

Open

matthewdouglas reviewed Aug 26, 2024

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

matthewdouglas added the Quantization label Aug 26, 2024

SunMarc and others added 2 commits August 27, 2024 14:15

Update src/transformers/modeling_utils.py

bb12e88

Co-authored-by: Matthew Douglas <[email protected]>

bitsandbytes: prevent dtype casting while allowing device movement wi…

d064b48

…th .to or .cuda

matthewdouglas reviewed Aug 27, 2024

View reviewed changes

quality fix

22f6088

matthewdouglas marked this pull request as ready for review August 28, 2024 14:01

matthewdouglas requested review from ArthurZucker and LysandreJik August 28, 2024 14:03

matthewdouglas mentioned this pull request Aug 28, 2024

Enable BNB multi-backend support #31098

Merged

LysandreJik approved these changes Aug 30, 2024

View reviewed changes

Improve warning message for .to() and .cuda() on bnb quantized models

462ac2c

ArthurZucker approved these changes Aug 30, 2024

View reviewed changes

SunMarc merged commit 9ea1eac into main Sep 2, 2024
24 checks passed

SunMarc deleted the remove_to_4bit branch September 2, 2024 14:28

SunMarc mentioned this pull request Sep 2, 2024

[Quantization] Add quantization support for bitsandbytes huggingface/diffusers#9213

Open

8 tasks

sayakpaul added a commit to huggingface/diffusers that referenced this pull request Sep 3, 2024

harmonize changes with huggingface/transformers#33122

31725aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove to restriction for 4-bit model #33122

remove to restriction for 4-bit model #33122

SunMarc commented Aug 26, 2024

HuggingFaceDocBuilderDev commented Aug 26, 2024

matthewdouglas commented Aug 26, 2024

matthewdouglas commented Aug 26, 2024

matthewdouglas Aug 27, 2024

SunMarc Aug 29, 2024

LysandreJik left a comment

LysandreJik Aug 30, 2024

matthewdouglas Aug 30, 2024

remove to restriction for 4-bit model #33122

remove to restriction for 4-bit model #33122

Conversation

SunMarc commented Aug 26, 2024

What does this PR do ?

HuggingFaceDocBuilderDev commented Aug 26, 2024

matthewdouglas commented Aug 26, 2024

matthewdouglas commented Aug 26, 2024

matthewdouglas Aug 27, 2024

Choose a reason for hiding this comment

SunMarc Aug 29, 2024

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik Aug 30, 2024

Choose a reason for hiding this comment

matthewdouglas Aug 30, 2024

Choose a reason for hiding this comment