Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bot job manager crashes #191

Open
TopRichard opened this issue Jun 27, 2023 · 2 comments
Open

Bot job manager crashes #191

TopRichard opened this issue Jun 27, 2023 · 2 comments

Comments

@TopRichard
Copy link

TopRichard commented Jun 27, 2023

Checking out the PR in : NorESSI/software-layer#132, we can see that job ids 5477 and 5478 have been submitted, yet the URLs for those jobs are not accessible, this causes an issue for the bot as it is unable to update the job status, and thus the job manager crashes.

Note: The jobs were manually cancelled, but it seems unlikely that this deleted the corresponding comments on GitHub.

@trz42
Copy link
Contributor

trz42 commented Jun 27, 2023

Additional information:

@laraPPr
Copy link
Collaborator

laraPPr commented Feb 21, 2024

Opened a pr so that the job manager does not crash if slurm is temporarily not available.
Only tested that it logged correctly in the PyGhee log when the squeue command fails.
This can make the running of the bot look a little weird because I have seen what the bot does when jobs suddenly disappear and reappear as current_jobs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants