-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for persistent node labels for node stop/restart #222
Comments
|
Adding myself as an assignee as I had started on the second item mentioned in #222 (comment): unlocking the node_controller.go functionality. |
I finally got back to uncommenting and updating the node_controller.go code but quickly discovered that node_controller and the current shim_controller step on each other's toes. The shim controller already watches for node label changes which also apparently includes new nodes registering with a cluster. So, today, without any node_controller.go code, I believe we have the functionality to handle this issue, i.e. support new nodes already having the appropriate label per a given Shim's nodeSelector config. I added a GH workflow test to help verify this; see a sample run here. I've also verified that RCM works today in conjunction with AKS VMSS. When I label a node pool with the appropriate shim nodeSelector (in my case, If RCM today watches for node label events, including new nodes, I wonder if we can fully do without node_controller.go. Perhaps this was what the original authors might have been thinking, hence being commented out. Is there any other node management functionality that RCM needs that would warrant a node_controller? cc @voigt @Mossaka @phyrog |
Context: In AKS clusters, when the cluster is stopped and restarted, new nodes are provisioned as part of the VMSS design. This leads to the loss of Kubernetes annotations on the nodes (i.e.
kwasm.sh/kwasm-node=true
), which impacts configurations that rely on those annotations. The current behavior of runtime-class-manager is to mutate containerd configurations based on annotations that are manually applied to nodes. However, since annotations are not persistent across cluster restarts, this causes issues with retaining the necessary configuration.ref: spinkube/azure#24
Proposal:
Labels such as
kwasm.sh/kwasm-node=true
should be used instead of annotations for the runtime-class-manager to identify and mutate node configurations. Unlike annotations, labels are persistent across cluster restarts and node reallocation.The runtime-class-manager should be enhanced to detect and reapply configurations based on node labels automatically, without requiring manual intervention. This would ensure configurations persist across node restarts.
Lastly, the legacy code in
node_controller.go
needs to be cleaned up. cc @vdice @voigtThe text was updated successfully, but these errors were encountered: