New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Initializing of models from checkpoints/pretrained models has gotten a bit crazy #780

Open

dlwh opened this issue Oct 30, 2024 · 0 comments

Member

dlwh commented Oct 30, 2024

We have a lot of ways of initializing models right now:

from scratch
load the whole state from a levanter checkpoint of the current run (default if present)
loading the whole state from a levanter checkpoint of a different run (trainer.load_checkpoint_path)
loading just the model weights from a levanter checkpoint (trainer.initialize_from)
loading the weights and optimizer state, with an eye towards changing the data (.initialize_from)
ladoing from hf --initialize_from_hf

This is a bit insane and convoluted. We should come up with some rational strategy here.

dlwh mentioned this issue

Merging DiVA to Levanter Main #779

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment