Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss daemonset state removal #196

Open
asm582 opened this issue Oct 24, 2024 · 4 comments
Open

Discuss daemonset state removal #196

asm582 opened this issue Oct 24, 2024 · 4 comments
Assignees

Comments

@asm582
Copy link
Contributor

asm582 commented Oct 24, 2024

In the current system, the daemon set sets states in instance allocation to created (when mig slice is created on GPU with configmap added to pod namespaces) and deleted (when mig slice is deleted from the GPU with configmap deleted from the namespace).
The controller can listen to config map events and move allocation to ungated when a config map create event occurs or remove allocation when a config map delete event occurs. This approach reduces the ping-pong for changing allocation status by two controllers for a pod.

@harche
Copy link
Contributor

harche commented Oct 24, 2024

The events in k8s are not guaranteed to arrive in sequence. So in a rapidly starting and stopping, the deleted event may arrive before created one.

@sairameshv
Copy link
Member

/assign

@sairameshv
Copy link
Member

Here is the web sequence diagram of the existing Stateful Daemonset behavior

Stateful Daemonset

Following is the proposal/design to avoid updating the allocation status of the Instaslice object by the Daemonset

Stateless Daemonset

@asm582 , Let me know your thoughts on the new workflow

@asm582
Copy link
Contributor Author

asm582 commented Nov 27, 2024

Thanks, @sairameshv. Excellent summary; yes, this is what we want to achieve. We should continue to follow the design of creating before deleting. @harche mentioned out-of-order event triggers like config map deletes events, that could occur before creating, which we need to handle in the controller logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants