-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
continue training of already saved model (extending TrainRunner) #1065
Comments
plus 1 for this |
So there is the simple way where we can just maintain the old history so it's not overwritten; however, if there is a desire to also have the whole optimizer state/learning rate scheduler state we have to do more engineering. Thoughts? In the first case (simple way) it would be a fresh optimizer and schedulers. |
I only want the old history not overwritten for what its worth |
In the more complicated case, we'd have to
Then to the train methods we can maybe add a parameter like |
So in either simple or complex case, we can do the following: Change this line to |
I think it's ok to create a new optimiser when continuing training (this is what pymc3 does by the way) - just load state param dict and continue history. @adamgayoso is this what you mean by a simple case? |
My use case for this is 'train->save->potentially start new cluster job->load->continue training'. One problem with this which I see now is that when the saved model is loaded, one training step is run, and the training history is lost - this will solve that issue, right? |
For Pyro based models we might not have a choice. In general though I think it would be nice to maintain the gradient information for optimizers like Adam (which is part of the "complex" solution, though easy with pytorch lightning)
Yes, but again, Pyro models need some special care. |
I see. In my opinion, a simple solution should just preserve history, including when the models are loaded. Is it necessary to train a loaded model for 1 iteration? If this is done just to initialise the guide properly - then maybe this can be done in evaluation mode? For example, just using |
So this commit, f9652f2, solves the issue for loading models but keeping history when continuing training remains to be addressed, right? |
Yes this issue remains to be addressed (we are getting there). That commit you referenced fixes the loading issue for pyro models. |
Would be great to have the option to continue the training of the already saved model. @adamgayoso said this needs to go into the TrainRunner. One thing you need to do is to combine old and new training history like this:
The text was updated successfully, but these errors were encountered: