Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support nic-cluster-policy update #73

Open
2 of 3 tasks
abdallahyas opened this issue Dec 13, 2020 · 2 comments
Open
2 of 3 tasks

Support nic-cluster-policy update #73

abdallahyas opened this issue Dec 13, 2020 · 2 comments
Labels
enhancement New feature or request priority-medium

Comments

@abdallahyas
Copy link
Contributor

abdallahyas commented Dec 13, 2020

Currently the network operator only support updating the components versions, any other value is ignored until either the component pod is restarted or the nic-cluster-policy is redeployed. The update needs to handle:

  • Support deleting a component (delete the component daemonset) if the nic-cluster-policy was updated to not have that component. This can be more clear in case of helm based deployments, in which the value for a <component>.deploy was updated from true to false.
  • Support changing the nic-cluster-policy configuration (other than the version) without the need to redeploy the nic-cluster-policy. This basically applies to the rdma-shared-dev-plugin in which case the pod should be restarted to take the new values, but can be extended to the other components as well like the whereabout IPAM plugin.
  • Support updating both the mofed and the nv-peer-mem images without breaking the workloads.
@adrianchiris
Copy link
Collaborator

Another Important use-case for update would be the update of mofed , nv-peer-mem driver containers.
this bit is tricky as it will cause re-creation of all mellanox net-devices which may cause workloads to stop working.
as this happens at cluster scale we need to pay attention on how we would like to perform this.

@adrianchiris
Copy link
Collaborator

Item 1 and item 3 are handled. only item 2 remains.

item 2 can be done by either:

  1. restarting device plugin
  2. adding watch logic in rdma shared device plugin (if config map update is reflected in pod mounts, need to validate)

@adrianchiris adrianchiris added enhancement New feature or request priority-medium labels May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority-medium
Projects
None yet
Development

No branches or pull requests

2 participants