-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a server group to ensure anti-affinity for control plane nodes #1256
Comments
I knew OCP seems has such by default, are you suggesting we should default it in our template or in the code directly? |
IMHO, it would be nice if we could manage a server group for the control plane nodes. It would be better for the "Infrastructure-as-Code" approach to have the whole thing defined in Kubernetes resources, rather than having to make the server group ahead of time. |
We should probably at least document it in the CAPO book as well. |
yes, we have such definition in api/v1alpha5/openstackmachine_types.go
I think maybe we can refer to other impl, like sec group we did, a bool var to control whether we create for customer |
For the control plane we certainly have the opportunity to create and manage this server group: the server group will have the same lifecycle as the cluster, so it can be managed with the cluster. For machines in a machineset it is not currently possible to do this in CAPI as far as I'm aware. I have previously mooted the possibility of some CAPI object whose lifecycle spans multiple machines. The best suggestion so far is something along the lines of MachinePools, however this would be an abuse of this interface IMHO as OpenStack has no such concept. An object with the same lifecycle as a MachineSet would probably be sufficient: you'd need a new machineset for every server group, but this doesn't sound unreasonable to me. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I'd like to re-open this issue and help implement this. We are a cloud provider working on implementing OpenStack Magnum managed CAPO clusters and these must have The preferred defaults would be Proposal:
Note: This would need to apply to control plane nodes only, so perhaps it should be named with a I note the restrictions of CAPI in creating the servergroup for MachineSets. Perhaps this is not in the first pass, but if we had an available |
Related issue: #808 |
/reopen |
@dalees: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@jichenjc: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@dalees Using pre-existing server groups is possible in the Helm charts already, so we could actually implement this fully in the Magnum driver if required. I still think it would be nice to do in CAPO though. For the CP nodes it is fairly obvious that we can just add a control plane affinity policy to For the workers it is less obvious, as what you probably want is a server group per MD, but we only get to deal in machines in CAPO, and we aren’t supposed to care which MD they are in. @mdbooth Didn’t you have ideas for reworking all the AZ logic? Maybe server groups could be worked into that? |
Server groups won't live in the AZ logic when that (hopefully) happens. The reason is that AZs are basically a templating mechanism: they define a set of things which vary between machines. Server groups don't (must not) vary between members. I would like to see us support server groups better than we do currently, which as Matt says is to just create one externally and reference it in the machine spec. I believe we will need a solution for this soon for OpenShift integration, so it will likely be on my TODO list... some time in the next 6 months, maybe? If somebody else wanted to implement it sooner I'd be delighted. I don't want to add any more 'cluster-magic' to the machine controller. i.e. My preference is that the machine spec entirely specifies the machine, without reference to the cluster spec. So I'd be happy for the cluster controller to create the server group, I'd still want the control plane machine spec to reference it explicitly. |
So Could this be a good time to have an extra CRD for an |
@EmilienM - Great, if you'd like to take this forward and progress the CRD and controller for this, that'd be welcomed. I am prepared to write the control plane part of this, but not ready to dedicate the time to write the CRD/Controller proposal and implementation. I may do the control plane part for my own learning and add a PR in the next few weeks. I don't expect this to be merged if a CRD based replacement is in the pipeline :) Happy to follow, feedback and test. Having this functional for node groups would be great. I can see a good use case for a different server group per node group, matching @nikParasyr 's case 3. |
Personally, I really like the idea of an |
Just to clarify, I think users of CAPO should create instances of the So the order of precedence for a machine would be:
CAPO will manage the actual OpenStack security group backing an Does that make sense? @mdbooth? |
This issue is now important to my current work, I'm planning to work on this next week and will propose a PR implementing the new CRD for @EmilienM please let me know if you have progressed this, otherwise I'm happy to pick this up now. |
Quick update: I plan to upload a PR for feedback next week. I have a working version of the |
I'm honestly not sure we want a new CRD & controller for this as it might be too complex for our needs but I'm happy to be proven wrong. |
Implements new CRD for OpenstackServerGroup in v1alpha8 to allow managed Server Groups with standard policies, and adds ServerGroupRef to OpenstackMachine that references the new CRD and uses it for VM creation. Closes: kubernetes-sigs#1256
I've uploaded a first pass at this, there are a few TODO's listed but it is functional. On the topic of it being a CRD & controller, I can see this is fairly heavyweight for a simple feature, but it does provide a solution to all use cases in #1256 (comment) and deals with the lifecycle cleanly. That's the main benefit of this. After we discussed earlier in this thread, I noticed Can you think of other options? How do we decide the way forward? I'm happy to join the next CAPO meeting to discuss. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
Implements new CRD for OpenstackServerGroup in v1beta1 to allow managed Server Groups with standard policies, and adds ServerGroupRef to OpenstackMachine that references the new CRD and uses it for VM creation. Closes: kubernetes-sigs#1256
Implements new CRD for OpenstackServerGroup in v1beta1 to allow managed Server Groups with standard policies, and adds ServerGroupRef to OpenstackMachine that references the new CRD and uses it for VM creation. Closes: kubernetes-sigs#1256
Implements new CRD for OpenstackServerGroup in v1beta1 to allow managed Server Groups with standard policies, and adds ServerGroupRef to OpenstackMachine that references the new CRD and uses it for VM creation. Closes: kubernetes-sigs#1256
Implements new CRD for OpenstackServerGroup in v1beta1 to allow managed Server Groups with standard policies, and adds ServerGroupRef to OpenstackMachine that references the new CRD and uses it for VM creation. Closes: kubernetes-sigs#1256
Implements new CRD for OpenstackServerGroup in v1beta1 to allow managed Server Groups with standard policies, and adds ServerGroupRef to OpenstackMachine that references the new CRD and uses it for VM creation. Closes: kubernetes-sigs#1256
Implements new CRD for OpenstackServerGroup in v1beta1 to allow managed Server Groups with standard policies, and adds ServerGroupRef to OpenstackMachine that references the new CRD and uses it for VM creation. Closes: kubernetes-sigs#1256
Implements new CRD for OpenstackServerGroup in v1beta1 to allow managed Server Groups with standard policies, and adds ServerGroupRef to OpenstackMachine that references the new CRD and uses it for VM creation. Closes: kubernetes-sigs#1256
Implements new CRD for OpenstackServerGroup in v1beta1 to allow managed Server Groups with standard policies, and adds ServerGroupRef to OpenstackMachine that references the new CRD and uses it for VM creation. Closes: kubernetes-sigs#1256
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/kind feature
Describe the solution you'd like
We should put control plane nodes in a server group with the soft anti-affinity policy to decrease the risk that a hypervisor failure takes out the whole control plane.
This becomes more important once #1252 is merged, but is something that I think we should do even when using explicit AZs.
Anything else you would like to add:
The text was updated successfully, but these errors were encountered: