Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(optimizer): use optimizer_cls_and_kwargs for custom optim #2012

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

NanoCode012
Copy link
Collaborator

@NanoCode012 NanoCode012 commented Nov 4, 2024

Breaking change

  • Deprecates alternate_optimizer. Move to use optimizer instead to consolidate.

Description

Refactors the optimizer handling to offload from create_optimizer into the Trainer instead. Only lora_plus requires overriding that function now.

  • ao_adamw_4bit is available on transformers as adamw_torch_4bit.
  • lion_pytorch is available on transformers as lion_32bit.
  • Removed default handling of loraplus_lr_embedding to let it inherit from Model Input class instead
  • Updated docs to include new upstream optimizers

TODO:

  • Verify adamw_anyprecision is not affected
  • Verify Galore is not affected
  • Discuss whether to support other optimi optimizers https://optimi.benjaminwarner.dev/which_optimizer/
  • Update transformers to commit that includes the linked PR
  • Test custom optimizers
  • Test upstream optimizers

Motivation and Context

This is due to upstream PR opening a new config huggingface/transformers#34358

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Next steps

Once some time passes:

  • Deprecate lion_pytorch and remove the dependency

@winglian
Copy link
Collaborator

winglian commented Dec 6, 2024

@NanoCode012 now that 4.47.0 is out, we can re-open this.

@NanoCode012 NanoCode012 marked this pull request as ready for review December 9, 2024 11:10
@NanoCode012
Copy link
Collaborator Author

I noticed that the embedding scaled optimizer change was only applied for SFT trainer. Should it also be applied to DPOTrainer? @winglian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants