consideration about job-specific back-off-based polling intervals ? #44

dpastoor · 2024-01-19T19:56:09Z

Hi Daniel - just stumbled across your package - very nice! was hoping to find such an extension rather than write one from scratch :-)

in reading through the docs - one question I had for you was whether you've though much around job-specific polling intervals. In particular, thinking about how in the lifecycle of a job it would be of interest to balance keeping information fresh vs not hitting too hard on the slurmdb.

For example, a heuristic that might poll every 5-10 seconds for the first minute of a job, then back off with some sort of step function to a steady-state of every 300 seconds, which would better catch jobs that fail as they start but then as the job really gets going settle down?

Curious if this is something you've thought about much?

Dando18 · 2024-01-21T04:09:10Z

That would certainly be a useful feature! One potential issue is detecting when jobs start. If a job is pending in the queue then you'd have to poll it more frequently to detect when it starts. The polling heuristics would be less helpful if it missed the job start time by a significant margin.

It could still be useful on systems where jobs tend to run immediately or the refresh happens to catch the beginning of a job. The "refresh curve" could apply to the polling interval for pending jobs as well.

My next objective is to add better job array support, but I can look into this after that. Thanks for the suggestion :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consideration about job-specific back-off-based polling intervals ? #44

consideration about job-specific back-off-based polling intervals ? #44

dpastoor commented Jan 19, 2024

Dando18 commented Jan 21, 2024

consideration about job-specific back-off-based polling intervals ? #44

consideration about job-specific back-off-based polling intervals ? #44

Comments

dpastoor commented Jan 19, 2024

Dando18 commented Jan 21, 2024