-
Notifications
You must be signed in to change notification settings - Fork 26.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove to restriction for 4-bit model #33122
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Thanks @SunMarc! I've tested moving between gpu->cpu->gpu, but not yet on multiple GPUs. We'll still see a warning from accelerate:
|
Reference note: this should fix #24540 for 4bit. For 8bit there is still a blocker: bitsandbytes-foundation/bitsandbytes#1332; once that's fixed & released on the bitsandbytes side we can do an additional PR. |
Co-authored-by: Matthew Douglas <[email protected]>
src/transformers/modeling_utils.py
Outdated
if getattr(self, "quantization_method", None) == QuantizationMethod.BITS_AND_BYTES: | ||
if getattr(self, "is_loaded_in_4bit", False): | ||
if version.parse(importlib.metadata.version("bitsandbytes")) < version.parse("0.43.0"): | ||
if version.parse(importlib.metadata.version("bitsandbytes")) < version.parse("0.43.2"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SunMarc I've bumped this to 0.43.2 since that's when bitsandbytes-foundation/bitsandbytes#1279 was landed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for updating the PR !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! This looks good
src/transformers/modeling_utils.py
Outdated
raise ValueError( | ||
"Calling `cuda()` is not supported for `4-bit` quantized models. Please use the model as it is, since the" | ||
" model has already been set to the correct devices and casted to the correct `dtype`. " | ||
"However, if you still want to move the model, you need to install bitsandbytes >= 0.43.2 " | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warning isn't super clear to me in terms of what the user should or should not do; should they install the new version or should they just let the model there? I'd try to clarify this a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good feedback, thanks! Updated. I think in most cases the user would be using .cuda()
without realizing it is already on a GPU so I put the current model.device
in the message. That should help inform on whether they really meant to move it somewhere else and need to upgrade.
* remove to restiction for 4-bit model * Update src/transformers/modeling_utils.py Co-authored-by: Matthew Douglas <[email protected]> * bitsandbytes: prevent dtype casting while allowing device movement with .to or .cuda * quality fix * Improve warning message for .to() and .cuda() on bnb quantized models --------- Co-authored-by: Matthew Douglas <[email protected]>
* remove to restiction for 4-bit model * Update src/transformers/modeling_utils.py Co-authored-by: Matthew Douglas <[email protected]> * bitsandbytes: prevent dtype casting while allowing device movement with .to or .cuda * quality fix * Improve warning message for .to() and .cuda() on bnb quantized models --------- Co-authored-by: Matthew Douglas <[email protected]>
What does this PR do ?
Since bnb 0.43.0, you freely move bnb models across devices. This PR removes the restriction we put in place.
Needs to be tested. cc @matthewdouglas