No saving of checkpoints during last prediction phase of training after all epochs are saved #1764
Unanswered
paul-reiners
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We're running into a small problem. After the 1000 epochs of training are done, one final part is run, where predictions are made.
It looks like this:
No checkpoints are saved during this section. So, if our SLURM script times out, nnU-Net starts over with this section when we start it up again (even with the -c option). Given a 24-hour SLURM time-out, It doesn't finish this section (for any of the folds).
Can we enable the saving of checkpoints of nnU-Net during this section?
Beta Was this translation helpful? Give feedback.
All reactions