Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Can't find a valid checkpoint at /data1/path/checkpoints/stage_2/checkpoint-50000 #78

Open
daixixiwang opened this issue Aug 1, 2024 · 0 comments

Comments

@daixixiwang
Copy link

When I run sh ./scripts/tune_script/graphgpt_stage2.sh, I encounter an error, and the error message is as follows:

raise ValueError("Can't find a valid checkpoint at {resume_from_checkpoint}")
ValueError: Can't find a valid checkpoint at /data1/path/checkpoints/stage_2/checkpoint-50000

I have checked the contents of /data1/path/checkpoints/stage_2/checkpoint-50000 and listed the following files:

config.json pytorch_model-00001-of-00003.bin rng_state_1.pth
generation_config.json pytorch_model-00002-of-00003.bin

I would like to ask if anyone has encountered a similar issue where the checkpoint files exist, but the script reports that it cannot find them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant