Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add QLoRA #820

Draft
wants to merge 11 commits into
base: dev
Choose a base branch
from
Draft

Add QLoRA #820

wants to merge 11 commits into from

Conversation

anwai98
Copy link
Contributor

@anwai98 anwai98 commented Dec 21, 2024

This PR adds finetuning SAM using QLoRA.

TLDR: The in-place operation in LoRA's forward pass was creating the fuss for us. It is taken care of now ;)

PS. QLoRA works now and the loss and metric looks as desired (checked it for a couple of epochs now). There still are two major concerns: 1) QLoRA currently only works in full precision training, 2) the memory engagement is the same as LoRA :/

cc: @caroteu @constantinpape

@anwai98
Copy link
Contributor Author

anwai98 commented Dec 22, 2024

Okay, QLoRA works as expected now (now with mixed precision as well)! 🥳

PS. The bitsandbytes optimizers cannot handle the precision on which AMP allows gradient calculations. This created some of the missing fuss. And the PyTorch optimizers apparently can handle this easily.

PPS. The memory advantage over LoRA is super minor. I do see some advantage when trying it over vit_h but it's not the case for smaller ViTs.

I'll run a full-scale training on LIVECell and perform a final validation to confirm stuff. The metric and loss looks good so far!

# QLoRA: ~ 65.68 GB
# LoRA: ~ 67.14 GB
# FFT: ~72.34 GB

# Run training.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the memory engagement mentions while training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant