-
-
Notifications
You must be signed in to change notification settings - Fork 650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Resume study and avoid OOM when optimizing with Optuna plugin #1679
Comments
Hi @dianemarquette, In any case, we do not have the cycles for it, which means this will happen if someone from the community wants to work toward it. Supporting gc_after_trial seems like it should be straight forward though. |
@omry Thanks for your quick reply. Any idea when |
It can be supported after sends a pull request to add support for it. |
Ok, thanks for the clarifcation :) |
@dianemarquette as of now resuming trials is somewhat supported by the optuna optimizer, by setting a storage backend: hydra.sweeper.study_name=my_trial But this will start the job numbering always from scratch and will therefore overwrite the output directories of individual jobs. |
what's more, it did be able to resume study but will inevitably launch multiple replicated runs for a specific params combination, since it will still run 80 times (let's assume for a grid search, there are 80 exps in total.) without launching those exps that have been successfully excecuted. |
and yes, gc collect is a feature i want too, cos now, no matter how i set the n_jobs or pre_dispatch params, the finished jobs will still exists and will exit until next group of parallel trials finish. |
Hydra has callbacks which can probably be used for it. |
As far as I can tell we only need to add You can find my code here. I do not have much experience with Would be great if someone could help me 😅 Also I am not sure how to write a test that actually tests what I coded, since gc.collect() does not return anything. I managed to modify a test and added gc_after_trial and the config got build correctly. But we would need a test that actually loads a model with cuda right? |
🚀 Feature Request
I would like to be able:
gc_after_trial
parameter of Optuna'sstudy.optimize()
Motivation
Is your feature request related to a problem? Please describe.
I'm always frustrated when my code crashes after 60 trials (out of 100). I suspect an OOM error. Being able to prevent the script from crashing in the first place with
gc.collect()
would be great. However, at least being able to resume my search from where it stopped would be a game changer.Pitch
Describe the solution you'd like
I would like to set
gc_after_trial
to True and a path to store my study parameters after each trial in my Optuna sweeper hydra config.Describe alternatives you've considered
I read Optuna's documentation but I'm not sure how to make their examples work with Hydra:
Are you willing to open a pull request? (See CONTRIBUTING)
I'm not comfortable enough with Optuna's and Hydra's library to prepare a pull request.
The text was updated successfully, but these errors were encountered: