-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Reconcile etcd members on control plane scale down #265
🐛 Reconcile etcd members on control plane scale down #265
Conversation
5334096
to
d6bc7ea
Compare
06c1752
to
5288427
Compare
Current implementation is passing e2e tests with testing a single node upgrade and scaling to 1 scenarios. This will allow to preserve existing clusters, but in order to apply the fix, the cluster control plane replicas will have to be scaled up one node at some moment. Will explore different approach with passing generated certificates to the rke2 server. This may be closer to the upstream, but will require all existing clusters to be re-created, or have some etcd migration mechanism (like rke2 certificate rotation to manually supplied, if such thing is supported).
|
5288427
to
5e6a700
Compare
@richardcase Presubmit e2e job passed without issues here :) And under 30 minutes. |
5e6a700
to
0e2909f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot for taking care of this problem
aa1f3fb
aa1f3fb
to
9b55b97
Compare
e2e are failing after rebase and the output is very cryptic. I’m also seeing a bug in the RKE2 code (non-changed) related to |
d48700a
to
aaa9725
Compare
Signed-off-by: Danil Grigorev <[email protected]>
Signed-off-by: Danil Grigorev <[email protected]>
Signed-off-by: Danil Grigorev <[email protected]>
e6bda45
to
58bb752
Compare
Signed-off-by: Danil Grigorev <[email protected]>
58bb752
to
04b620d
Compare
Signed-off-by: Danil Grigorev <[email protected]>
Signed-off-by: Danil Grigorev <[email protected]>
04b620d
to
1ad956d
Compare
- Disc pressure fix for kube-vip Signed-off-by: Danil Grigorev <[email protected]>
8346abb
to
734414e
Compare
Signed-off-by: Danil Grigorev <[email protected]>
734414e
to
5289859
Compare
Tests are green again, there were some issues with the code, some problems are however in CAPI framework. It appears multiple machine sets are created per machine deployments in upgrade scenario. Testing code is unable to distinguish between those. Kube-vip is not helpful as the pods are getting evicted due to MemoryPressure on the child cluster nodes, and the default set of tolerations are for some reason ignored there. Sometimes tests may flake, because with no load balancing solution may cause RKE2 agent to connect on a non-existing (recently removed) node. Other issue I observed with kube-vip, is that leader election is never able to release a lock from a dead pod on a node where the etcd instance is offline. This is likely a client-go issue. That being said, the PR is ready to be merged, as the functionality is consistent with description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok to me.
@Danil-Grigorev - we should also follow-up on the Kube-vip is not helpful
comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM
kind/bug
What this PR does / why we need it:
This change allows to establish connectivity to child cluster etcd members, and manage membership during cluster scaling. Specifically this is required when a cluster etcd leader is removed by scaling down procedure, as this causes the cluster API server to become unavailable and never come back online.
Therefore the etcd leader needs to be moved just before requesting node deletion, and etcd membership has to be adjusted as well.
RKE2 follows a different certificate management model opposed to CAPI, therefore we can’t provide CAPI certificates on node bootstrap, and instead have to fetch the generated certificate by the rke2 server during bootstrapping.
New clusters will use regular CAPI certificate management model, where the certificates are provided to the RKE2 agent on the initialization and are generated if missing by cabprke2. Therefore, for the time being there will be 2 co-existing cluster configurations, which will be reduced to the upstream one by performing certificate rotation in the future.
Existing clusters will not be required to perform scale up in order for the fix to take effect. Regular upgrade will do this, even with
MaxSurge=0
setting. For them, if both local etcd secret, and child bootstrapped etcd secrets are missing, no etcd operations will be performed. First upgraded node, however, will be the “scale up” to populate the child cluster secret. This way it is guarantied that every cluster will eventually get the fix.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #263
Special notes for your reviewer:
Checklist: