-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-35256: Allow bootstrap node in existing instance groups #66
OCPBUGS-35256: Allow bootstrap node in existing instance groups #66
Conversation
e0d8553
to
4cb47f1
Compare
/retest images |
@patrickdillon: The
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
4cb47f1
to
829f5ea
Compare
@patrickdillon: This pull request references Jira Issue OCPBUGS-35256, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
829f5ea
to
ba9e0bf
Compare
/jira refresh |
@patrickdillon: This pull request references Jira Issue OCPBUGS-35256, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
ba9e0bf
to
ad8df54
Compare
/jira refresh |
@patrickdillon: This pull request references Jira Issue OCPBUGS-35256, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh |
@patrickdillon: This pull request references Jira Issue OCPBUGS-35256, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
While testing my changes to this PR, it was necessary to run Line 165 in 539eb06
|
As this is building, minimally, on top of another carry patch, please LMK if there's anything I can do commit-wise to make this easier for future carries/rebases. |
@@ -166,7 +166,7 @@ func (g *Cloud) EnsureLoadBalancer(ctx context.Context, clusterName string, svc | |||
return nil, err | |||
} | |||
|
|||
klog.V(4).Infof("EnsureLoadBalancer(%v, %v, %v, %v, %v): ensure %v loadbalancer", clusterName, svc.Namespace, svc.Name, loadBalancerName, g.region, desiredScheme) | |||
klog.V(4).Infof("debug EnsureLoadBalancer(%v, %v, %v, %v, %v): ensure %v loadbalancer", clusterName, svc.Namespace, svc.Name, loadBalancerName, g.region, desiredScheme) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should either remove the debug
prefix in all the logging calls or change the log level.
I suspect this is left over from debugging work, though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right, but I forgot to revendor after cleaning up my work. Thanks for catching this. Should be fixed now!
Can you refresh my memory - how many instance groups are there total for the masters? |
ad8df54
to
58222a4
Compare
One for each master, so 3 instance groups. The bootstrap VM is included in the first instance group. See https://issues.redhat.com/browse/OCPBUGS-35256?focusedId=24931575&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-24931575 |
This code update is only relevant for private installs, in which case the control plane nodes only belong to a single instance group. Here is a screenshot from a terraform install (note the bootstrap ig), showing the VMs and their instance groups: Here is a screenshot for capg-based, where bootstrap shares an IG with a master: |
Ah, ok, this has to do with the instance groups being confined to a region/zone, and the bootstrap node will have to be in one of them. I'm ok with carrying this change to our carried logic. I see there was discussion about using kubernetes-sigs/cluster-api-provider-gcp#1266 instead. Usually I'd prefer to use upstream, but IMO expanding the CAPG API to accommodate for this specific use case might be too broad of a change. I'm not a CAPG reviewer or maintainer, though. |
@@ -668,7 +668,8 @@ func (g *Cloud) ensureInternalInstanceGroups(name string, nodes []*v1.Node) ([]s | |||
parts := strings.Split(ins.Instance, "/") | |||
groupInstances.Insert(parts[len(parts)-1]) | |||
} | |||
if names.HasAll(groupInstances.UnsortedList()...) { | |||
groupInstanceNames := groupInstances.UnsortedList() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the code as is, but since this is looking like it will be a long term carry, a comment as to why we have these conditions will help reduce issue archeology in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The motivation was included in the commit message. I just brushed up again.
Do you think that works or we also need an in-code comment?
This updates the logic from ac00767 for using existing instance groups to allow the bootstrap node to be present in the instance group.. Prior to this commit logic to utilize existing instance groups checked to ensure the instance group contains only cluster nodes (which the bootstrap node is not), so if the bootstrap node is in the same ig as ne of the masters, the master would be added to a second ig, which breaks things. We can resolve the issue by relaxing the logic to ensure that all instances in the instance group have the node prefix. Vendors the changes in order to be utilized by the cloud provider. Fixes OCPBUGS-35256
58222a4
to
2f97fd3
Compare
I tested this PR with an internal cluster and it seems to work, I am no longer see the I think its preferable to use this PR as opposed to adding knowledge of a bootstrap VM to kubernetes-sigs/cluster-api-provider-gcp#1266 |
/lgtm |
@patrickdillon: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: nrb The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@patrickdillon: Jira Issue OCPBUGS-35256: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-35256 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[ART PR BUILD NOTIFIER] Distgit: ose-gcp-cloud-controller-manager |
With the move to CAPG-based installs, the installer places the bootstrap node in the same instance group as one of the masters, due to limitations in CAPG. (In Terraform installs the bootstrap node has its own instance group.) Having the bootstrap node in the same instance group as a master exposes a bug in the logic of a patch we carry. As the commit message states:
The logic for that check is implemented here:
cloud-provider-gcp/providers/gce/gce_loadbalancer_internal.go
Lines 671 to 674 in 539eb06
The issue is due to the fact that the check only uses "node" names, and cloud-provider-gcp does not consider the bootstrap VM a node. We can resolve the issue by relaxing the logic to ensure that all instances in the instance group have the node prefix.