Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zombie Job Handling #66

Open
Foxcapades opened this issue Oct 1, 2024 · 0 comments
Open

Zombie Job Handling #66

Foxcapades opened this issue Oct 1, 2024 · 0 comments

Comments

@Foxcapades
Copy link
Member

Problem

"Zombie Jobs" are jobs that have workspaces in MinIO but are not owned by any eda instances. They are rare occurrences resulting from exceptions thrown by MinIO itself. These jobs 'appear' to be running but are dead.

Presently, these cannot be removed or expired, and must be deleted manually.

Proposal

To enable us to detect and clear these jobs we will add 2 new endpoints.

  1. An admin endpoint that lists jobs that appear to currently be in progress.
  2. An admin endpoint that purges a job workspace from MinIO.

With these endpoints in combination with the existing internal-jobs endpoint we can:

  1. list jobs that appear to be running (actual running jobs will be in this list as well)
  2. use the internal jobs endpoint on both campuses to check for job ownership on each 'running' job
  3. If neither campus knows about the job, it is a zombie and we can safely make a request to the delete endpoint to wipe the workspace
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant