-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
continue training from checkpoint_best.pth instead of checkpoint_latest.pth #2563
Comments
Yeah - good idea |
Is there a way to change the parameter that define the save period to 50 epochs ? |
|
nnUNetv2_train |
The answer was how to continue training not validation |
Hi
I do not understand why you choose to continue training from the checkpoint_latest.pth instead of checkpoint_best.pth.
Checkpoint_latest.pth is saved every 50 epochs, so when we restart; we may loos up to 49 epochs.
In my case, I have a very long training epoch time (up to 2000 s sometimes) do loosing 50 epoch is then equivalent to loose 27 hours of computing ...
I find a way around, by just removing the checkpoint_latest.pth from the training log dir.
see
nnUNet/nnunetv2/run/run_training.py
Line 81 in 520e749
It would be more efficient to compare the date (or directly the epoch) of both checkpoint_best.pth checkpoint_latest.pth and choose the more recent
But may be I miss something ?
The text was updated successfully, but these errors were encountered: