-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add QLoRA #820
base: dev
Are you sure you want to change the base?
Conversation
Okay, QLoRA works as expected now (now with mixed precision as well)! 🥳 PS. The bitsandbytes optimizers cannot handle the precision on which AMP allows gradient calculations. This created some of the missing fuss. And the PyTorch optimizers apparently can handle this easily. PPS. The memory advantage over LoRA is super minor. I do see some advantage when trying it over I'll run a full-scale training on LIVECell and perform a final validation to confirm stuff. The metric and loss looks good so far! |
# QLoRA: ~ 65.68 GB | ||
# LoRA: ~ 67.14 GB | ||
# FFT: ~72.34 GB | ||
|
||
# Run training. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are the memory engagement mentions while training.
This PR adds finetuning SAM using QLoRA.
TLDR: The in-place operation in LoRA's forward pass was creating the fuss for us. It is taken care of now ;)
PS. QLoRA works now and the loss and metric looks as desired (checked it for a couple of epochs now). There still are two major concerns: 1) QLoRA currently only works in full precision training, 2) the memory engagement is the same as LoRA :/
cc: @caroteu @constantinpape