-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dynamic rate limiting of job submissions? #442
Comments
Following the route described in the docs here (https://maestrowf.readthedocs.io/en/latest/Maestro/how_to_guides/running_with_flux.html#launch-maestro-external-to-the-batch-jobflux-broker) seems like the best option for my use-case. I've managed to install Flux via Spack on this cluser. The one remaining issue is that I have to wait until the SLURM job starts before I can do If I wanted to modify the Maestro conductor code so it polls SLURM to see whether the Flux broker job has started, where should I start to do that? Is this feasible? |
Hi @BenWibking -- one thing to note is that |
Adding I was a bit thrown off by the wording in the documentation for the |
I checked the status of this study today and it seems to have stopped submitting new jobs to SLURM.
The last log entry is:
This full log for this study is here: The conductor process for this is still running:
This seems to reliably happen for studies that I run on this machine. |
This issue seems to be the same as #441, and that has more informative logs, so I'll close this. |
On the cluster I'm using, there is a hard limit of 36 jobs per user that are running or pending in the SLURM queue.
However, I need to run a 200 parameter study. Is there any workaround for this other than splitting this large study up into studies of <= 36 parameters?
It would be ideal if it were possible for the conductor process to wait until jobs complete and then submit new jobs.
The text was updated successfully, but these errors were encountered: