Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initializing of models from checkpoints/pretrained models has gotten a bit crazy #780

Open
dlwh opened this issue Oct 30, 2024 · 0 comments

Comments

@dlwh
Copy link
Member

dlwh commented Oct 30, 2024

We have a lot of ways of initializing models right now:

  1. from scratch
  2. load the whole state from a levanter checkpoint of the current run (default if present)
  3. loading the whole state from a levanter checkpoint of a different run (trainer.load_checkpoint_path)
  4. loading just the model weights from a levanter checkpoint (trainer.initialize_from)
  5. loading the weights and optimizer state, with an eye towards changing the data (.initialize_from)
  6. ladoing from hf --initialize_from_hf

This is a bit insane and convoluted. We should come up with some rational strategy here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant