-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] offloading to nodes with specific labels #1494
Comments
can this be done here https://github.com/liqotech/liqo/blob/master/cmd/virtual-kubelet/root/root.go#L145 nodeRunner, err := node.NewNodeController(
nodeProvider, nodeProvider.GetNode(),
localClient.CoreV1().Nodes(), // add nodeselector label here
node.WithNodeEnableLeaseV1(localClient.CoordinationV1().Leases(corev1.NamespaceNodeLease), int32(c.NodeLeaseDuration.Seconds())),
... |
Hi @DevSusu, If I understand it correctly, you would like to specify a node selector to offer only a subset of the resources available in the provider cluster (i.e., those associated with the nodes matching the selector). This feature makes sense to me (and also relates to excluding the tainted control plane nodes from the computation); it would require some modifications in the computation logic and in the shadow pod controller, to inject the given node selectors for offloaded pods. I cannot give you any timeline for this right now, but I'll add it to our roadmap for the future. If you would like to contribute, I can also give you more information about where to extend the logic to introduce it. As for the piece of code you mentioned, that is the controller which deals with the creation of the virtual node. Still, the amount of resources associated with that node are taken from the |
@giorio94 thanks for the prompt response, I really appreciate it
thanks for the summary 😄 I would like to contribute, it'll be great if you can give out some starting points! |
Nice to hear that! In the following you can find some additional pointers:
Feel free to ask for any further information. |
@giorio94 , thanks for the pointers I've skimmed through, and have a question/suggestion instead of caching the pods info, what about managing pod informers per node? when a node informer sends that a new node has been added, then register a pod informer with the this way, we don't need to worry about the timing issue you mentioned. caching the pod info needs some guessing about how long we should wait until the node infos come in, and that delay will effect the virtual node resource update period also. |
To me, the approach you propose makes definitely sense, and it also reduces the amount of pods observed by the informers in case most nodes are excluded by the label selector. As for the caching one, you could avoid guessing through a refactoring of the data structure towards a more node oriented approach (i.e,, storing the resources used by each peered cluster per physical node, rather than as a whole), and then marking whether a given node shall be included or excluded. This would also allow to cover the case in which node labels are modified, changing whether it matches the selector. I personally have no particular preference. I feel your proposal to be a bit cleaner, although it also requires some more work/refactoring to integrate it with the existing code (as the data structure changes are probably needed nonetheless to account for cleanup when a node is removed). |
Fixed by the offloading patch feature. See here docs |
Is your feature request related to a problem? Please describe.
I've walked through the examples and documents, and would like to use Liqo for our team's multicluster management.
Our use-case & situation can be summarized as below
eks-gpu
: offloads to EKS withnvidia.com/gpu=true
label nodeseks-spot
: offloads to EKS withcapacity=spot
label nodesDescribe the solution you'd like
Currently, it seems that created virtual node summarizes all nodes in the remote cluster.
I suggest using nodeSelector labels when offloading. So that virtual nodes reflects nodes only with the matching selector. Also injecting a
nodeSelector
term on offloaded pods will be useful (but OK to be done outside of Liqo).Describe alternatives you've considered
I can write a
mutatingwebhook
in the EKS cluster to inject thenodeSelector
term, but then the virtual node's contains too much unnecessary(or even confusing) information. Enough resource(CPU, memory) in virtual node, but pods not scheduling.Additional context
This feature can also help in multi-tenant scenarios, where you might not want to dedicate a cluster in every offloaded namespace.
tenant1
namespace(on-premise) andtenant1-eks
namespace(offloading).tenant2
namespace(on-premise) andtenant2-eks
namespace(offloading).In #1249 , @giorio94 suggested creating local shadow node for each remote node.
nodeSelector
feature will also help this scenario too.The text was updated successfully, but these errors were encountered: