Auto3DSeg + segresnet: code for resuming partially-trained folds #7506
pwrightkcl
started this conversation in
Ideas
Replies: 1 comment
-
Worked great, was able to continue my training by meshing this into my train command. Saved me days of almost wasted training, thank you! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
My Auto3DSeg use case is this:
I have separate scripts for data analysis, training, etc. to help with parallelisation, based on this tutorial notebook. Note that I recently learned you can also run selected steps using
AutoRunner
. Read more in this reply. I think for resuming you need to use a standalone script, but welcome more info.The short solution is to add these keyword arguments to the
train
call on a BundleAlgo object:Note again that I am using segresnet, and this solution probably won't work for dints and swinunetr (it might work for segresnet2d but I haven't tested it). The keywords arguments are referenced in the segresnet code here and here. If you dig into the scripts for the other algo types, you may be able to come up with a similar solution.
Also note that when I'm checking if the checkpoint file has the full number of epochs, I have to adjust epoch counter because because segresnet adjusts the number of epochs internally using
num_crops_per_image
. I don't fully understand how that works, so would appreciate any correction. It looks like the epoch counter in the checkpoint increments using the correction factor as a step size. For example, if you have 2 crops per image and 400 epochs, it will run for 200 epochs and count from 0 to 398 in steps of 2, rather than 0 to 199.Below is my code for training with resuming built in. I'm not a MONAI dev, just a user, so it is supplied "as is" in the hope it will save other users time. You will need to test it out in your own environment.
Beta Was this translation helpful? Give feedback.
All reactions