-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[occm] multizone race condition #2590
base: master
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @sergelogvinov. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
ff0fb62
to
6016b87
Compare
6016b87
to
745d91c
Compare
pkg/openstack/instancesv2.go
Outdated
klog.V(4).Infof("Instance %s should initialized first", node.Name) | ||
return true, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't checked the call-sites, but my initial thought is that this should return an error rather than true. It feels like we are potentially asserting that the instance exists when in fact we don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cloud-provider does nothing if we return positive (it's alive and exist) response. Otherwise it delete/taint the node. The CCM in another region can make right decision.
Than ProviderID is empty - yep, better return an error.
If ProviderID is not openstack magic string - better do nothing. Errors will create noisy logs.
AzureCCM does the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This providerID topic was discussed recently, it seems there is the assumptions for cloud providers to set it, so new nodes will wait until the providerID is set unless the cloud provider indicates is not implemented ,kubernetes/kubernetes#123713
Otherwise the node is not going to be initialized
pkg/openstack/instancesv2.go
Outdated
@@ -79,6 +79,18 @@ func (os *OpenStack) instancesv2() (*InstancesV2, bool) { | |||
|
|||
// InstanceExists indicates whether a given node exists according to the cloud provider | |||
func (i *InstancesV2) InstanceExists(ctx context.Context, node *v1.Node) (bool, error) { | |||
if i.regionProviderID { | |||
if node.Spec.ProviderID == "" { | |||
klog.V(4).Infof("Instance %s should initialized first", node.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
klog.V(4).Infof("Instance %s should initialized first", node.Name) | |
klog.V(4).Infof("Node %s should be initialized first", node.Name) |
Incidentally, it looks like you've started moving to structured logging in at least the loadbalancer code.
This might be written as a structured log instead:
klog.V(4).InfoS("Node is not yet initialised", "node", klog.KObj(node))
// Return true if instance is not openstack. | ||
func instanceNodeUnmanaged(providerID string) bool { | ||
if providerID != "" { | ||
return strings.Contains(providerID, "://") && !strings.HasPrefix(providerID, ProviderName+"://") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the first assertion (strings.Contains(providerID, "://")
)buy us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instanceIDFromProviderID
adds openstack://
if providerID has string like /instance-id
.
We are checking whether the providerID is in the old style or not. If yes - node is ours (backward compatibility)
|
||
if instanceNodeUnmanaged(node.Spec.ProviderID) { | ||
klog.V(4).Infof("Instance %s is not an OpenStack instance", node.Name) | ||
return true, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, is true
the correct return value here? I'd have to check the call-sites, but my feeling is that this could lead to incorrect behaviour possibly today, but if not in the future. My feeling is that we should return an error here.
pkg/openstack/instancesv2.go
Outdated
if node.Spec.ProviderID == "" { | ||
klog.V(4).Infof("Instance %s should initialized first", node.Name) | ||
return false, nil | ||
} | ||
|
||
if instanceNodeUnmanaged(node.Spec.ProviderID) { | ||
klog.V(4).Infof("Instance %s is not an OpenStack instance", node.Name) | ||
return false, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as above re structured logging and returning a non-error result.
33c3917
to
68fbb88
Compare
We cannot definitively determine whether a node exists within our zone without the provider ID string.
68fbb88
to
a2ca362
Compare
I feel like the underlying problem here is that Would it be safer and less prone to hard-to-fix future edge cases to address that instead? |
My changes based on these thoughts: If we decide to add a new region into the existing cluster, we'll need to redeploy the CCM with the CCM will use default region (regionA) for nodes lacking a region in their ProviderID. |
Got it. However, this would still be baking logic into the CCM based on a particular deployment strategy. I think it would still address your use case if we, e.g. updated getInstance to be multi-region aware in the event that providerID is not set. This would also mean that we would not be asserting that an instance exists when we don't know that it does. I can't help but feel that the fact this this happens not to break anything currently is not something we should rely on indefinitely. |
IIUC, the situation you're describing is one where we have multiple CCMs: one per region. The motivation for doing this is... OpenStack CCM is not multi-region aware? Is there any other reason? My initial thought was that it would be required for HA, but as long as your control plane is multi-region, a failure of a complete region where the CCM is running would result in the CCM running in another region. It's not to reduce east-west traffic, either, because the CCM is only talking to KAS, and etcd data is presumably replicated across your control plane anyway. That isn't quite where I thought I was going with this comment, btw. I just thought I'd state the motivation for wanting to do this first, only to realise I didn't understand it. Regardless, I think this is fundamentally a design question of supporting multiple multiple CCMs in a single cluster. I wonder if the node folks have any existing guidance on this. I vaguely recall having discussed this before. My impressions from that discussion, which would have been around the time of the external cloud-provider split, where:
Still, I wonder if it's worth reaching out to those folks. Potential outcomes:
Not sure what the forum to raise this is, tbh. I wonder if @aojea, @danwinship, or @andrewsykim could signpost us appropriately? |
// A providerID is build out of '${ProviderName}:///${instance-id}' which contains ':///'. | ||
// or '${ProviderName}://${region}/${instance-id}' which contains '://'. | ||
// See cloudprovider.GetInstanceProviderID and Instances.InstanceID. | ||
func instanceIDFromProviderID(providerID string) (instanceID string, region string, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems this is just moving code from one place to another, can you make this in a different commit so it simplifies the review?
} | ||
|
||
// Return true if instance is not openstack. | ||
func isNodeUnmanaged(providerID string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, this is the new change, the notion of "unmanaged node"
@@ -79,9 +79,20 @@ func (os *OpenStack) instancesv2() (*InstancesV2, bool) { | |||
|
|||
// InstanceExists indicates whether a given node exists according to the cloud provider | |||
func (i *InstancesV2) InstanceExists(ctx context.Context, node *v1.Node) (bool, error) { | |||
if i.regionProviderID { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with this concept, it seems that this capability makes the controller to skip the nodes considered unmanaged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that, the regionProviderID is an alpha feature, meaning that all code and logic within this block can be improved in the future, if i miss something.
I agree with you. Multi-region or hybrid cloud setups introduce complexity. Determining the affiliation of uninitialized nodes becomes challenging. I've conducted extensive research and experimented with various setups, including multi-region or hybrid Kubernetes clusters. For instance https://github.com/sergelogvinov/terraform-talos?tab=readme-ov-file#multi-cloud-compatibility Currently, I've been working on a branch OCCM that supports multi-region OpenStack setups, particularly for PS. I've developed my own multi-regional implementation of CCM/CSI for Proxmox, Proxmox CSI Plugin and Proxmox Cloud Controller Manager. After about a year of production experience with my implementation, I'm confident in its viability. |
definitely a topic to discuss in sig-cloud-provider , we recently had a long discussion on the provider ID topic that seems to not be well specified kubernetes/kubernetes#123713 so I will also validate all the assumptions so we not make any assumption now that is not correct |
this changes affect only PS. With this patch a node without a ProviderID can never be deleted... |
The architecture here is:
Is that right? Do the per-region CCMs actually shard the Nodes, or do we just ensure the responses from InstancesV2 are such that the controller will never take any action for a Node which is not in its region? |
Have you considered having kubelet set the providerID on Node creation, btw? Here's how CAPO does this: This is used by kubeadm when it creates the service which runs kubelet. It gets the instance-id from the local metadata service. Here's how we do it in OpenShift... for AWS unfortunately but the principal is the same (I'm convinced we did this for OpenStack too... may have to fix that 🤔): That unit runs on boot and sets a Unfortunately I don't think OpenStack |
Coming back to this patch specifically, I think I'd be more comfortable proposing a patch in k/k similar to kubernetes/kubernetes#123713 which causes the node-lifecycle controller to have some explicit behaviour (e.g. ignore) for Nodes which don't have a providerID. I continue to worry that in this PR we are mis-reporting to cloud-provider on the basis that we expect cloud-provider to have a certain behaviour. I think that will be fragile and hard to maintain. |
/ok-to-test |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/test pull-cloud-provider-openstack-check |
/remove-lifecycle stale |
Hi @mdbooth , I try to used CAPO in multi-cloud environment and you're right I don't encounter this race condition scenario since provider-id is set by kubeadm, I create different template one per cloud with region hardcoded like that: ---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
name: pool-region-one
annotations:
controlplane.cluster.x-k8s.io/skip-coredns: ""
controlplane.cluster.x-k8s.io/skip-kube-proxy: ""
spec:
template:
spec:
joinConfiguration:
nodeRegistration:
kubeletExtraArgs:
cloud-provider: external
provider-id: openstack://region-one/'{{ instance_id }}'
... This work pretty well with multiple OCCM deployed with variables However, CAPO don't handle region in magic string provider-id in his CRD: kubectl get openstackmachine
NAME CLUSTER INSTANCESTATE READY PROVIDERID MACHINE AGE
control-plane-6r5xs capoc ACTIVE true openstack:///87db7660-bc3b-48fd-b5c9-0fa2b2816eed control-plane-6r5xs 3h14m
pool-region-one-66rgq-t5c72 capoc ACTIVE true openstack:///f48f60bd-bae1-4ed9-b58f-aa1d90aa6579 pool-region-one-66rgq-t5c72 154m And he don't create other node because he is stuck on waiting this first one because CRD's provider-id don't containt region name and don't match kubernetes node I think that we simply have to add region in provider-id here https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/main/controllers/openstackmachine_controller.go#L425C1-L425C112 . I am looking to find a way to implement that correctly in CAPO. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
We cannot definitively determine whether a node exists within our zone without the provider ID string.
What this PR does / why we need it:
It affects if
OS_CCM_REGIONAL
istrue
The node trying to join to the cluster. It has non ready status and uninitialized taint.
The
cloud-node
controller is attempting to initialize the node simultaneously with thenode-lifecycle
controller's attempt to delete the node, as the node is located in another regionSo we need to skip nodes without ProviderID in
OS_CCM_REGIONAL
mode.Azure does the same az.IsNodeUnmanaged -> [IsNodeUnmanagedByProviderID] (https://github.com/kubernetes-sigs/cloud-provider-azure/blob/master/pkg/provider/azure_wrap.go#L86) - if ProviderID is not azure string (or empty) - set it as unmanaged node.
Which issue this PR fixes(if applicable):
part of #1924
Special notes for reviewers:
Release note:
OS_CCM_REGIONAL
is alpha feature, we do not need to update release note.Feature flag was introduced here #1909