Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to the Gateway leader election #2747

Closed
sridhargaddam opened this issue Oct 12, 2023 · 1 comment · Fixed by submariner-io/shipyard#1447
Closed

Improvements to the Gateway leader election #2747

sridhargaddam opened this issue Oct 12, 2023 · 1 comment · Fixed by submariner-io/shipyard#1447
Assignees
Labels
bug Something isn't working priority:high

Comments

@sridhargaddam
Copy link
Member

sridhargaddam commented Oct 12, 2023

While troubleshooting various issues with OCP, the following observations have been noted

The current configuration for Gateway leadership is as follows:

  • defaultLeaseDuration = 10 seconds
  • defaultRenewDeadline = 5 seconds
  • defaultRetryPeriod = 2 seconds

During times of heavy load, when multiple nodes are labeled as gateway nodes, it has been observed that the active Gateway leader sometimes fails to renew its lease within the 5-second interval and is getting timedout.

E1009 15:52:02.920492       1 leaderelection.go:327] error retrieving resource lock submariner-operator/submariner-gateway-lock: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/submariner-operator/leases/submariner-gateway-lock": context deadline exceeded
I1009 15:52:02.920599       1 leaderelection.go:280] failed to renew lease submariner-operator/submariner-gateway-lock: timed out waiting for the condition
�[90m2023-10-09T15:52:02.971Z�[0m �[33mDBG�[0m ..ols/record/event.go:298 main                 Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"submariner-operator", Name:"submariner-gateway-lock", UID:"9a3de507-5553-4341-b55c-01969f1bf699", APIVersion:"v1", ResourceVersion:"33398175", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' compute-1-submariner-gateway stopped leading
�[90m2023-10-09T15:52:02.972Z�[0m �[33mDBG�[0m ..ols/record/event.go:298 main                 Event(v1.ObjectReference{Kind:"Lease", Namespace:"submariner-operator", Name:"submariner-gateway-lock", UID:"caed3c1e-f5ee-4dbd-a8a1-e3ce708bf83c", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"33398176", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' compute-1-submariner-gateway stopped leading
�[90m2023-10-09T15:52:03.395Z�[0m �[31mWRN�[0m ..ne/syncer/syncer.go:184 GWSyncer             Deleted stale gateway: compute-2, didn't report for 15s
�[90m2023-10-09T15:52:03.415Z�[0m �[32mINF�[0m ..ne/syncer/syncer.go:290 GWSyncer             The Gateway entry for "compute-1" has been deleted
�[90m2023-10-09T15:52:03.416Z�[0m �[1m�[31mFTL�[0m�[0m ..al/pkg/log/logger.go:67 main                 Leader election lost, shutting down

Currently, this situation is treated as a FATAL error, leading to the restart of the Gateway pod.

Proposed Improvements:

  1. Adjust Lease and Renewal Duration: Increase defaultLeaseDuration to 15 seconds and defaultRenewDeadline to 10 seconds in the Submariner Gateway pod, or any appropriate values.
  2. Handling Renewal Failures: Instead of treating the renewal timeout as FATAL, log a WARNING and continue retrying the Leadership lock.

Implementing these improvements will significantly enhance the stability of the datapath in HA environments.
Also, in the current implementation, when the Gateway pod is restarted, a brief datapath downtime is seen with Libreswan cable driver. Avoiding Gateway pod restarts would allieviate this issue.

Additionally, for Globalnet deployments, the Globalnet controllers track the active leader. By reducing gateway migrations, we will have a more stable environment that minimizes the potential for race conditions.

@sridhargaddam sridhargaddam added the bug Something isn't working label Oct 12, 2023
@tpantelis tpantelis self-assigned this Oct 12, 2023
@skitt
Copy link
Member

skitt commented Oct 13, 2023

I agree with both proposals.

I can imagine the election settings needing tweaking depending on clusterset structure; it would be useful to expose them in the configuration objects, wouldn’t it? The leadership configuration can be controlled using environment variables on the Submariner pod but nothing currently sets them.

@tpantelis tpantelis moved this from Todo to In Progress in Submariner 0.17 Oct 18, 2023
tpantelis added a commit to tpantelis/submariner that referenced this issue Oct 20, 2023
...rather than exiting.

Related to submariner-io#2747

Signed-off-by: Tom Pantelis <[email protected]>
tpantelis added a commit to tpantelis/submariner that referenced this issue Oct 21, 2023
...rather than exiting.

Related to submariner-io#2747

Signed-off-by: Tom Pantelis <[email protected]>
tpantelis added a commit to tpantelis/submariner that referenced this issue Oct 23, 2023
...rather than exiting.

Related to submariner-io#2747

Signed-off-by: Tom Pantelis <[email protected]>
tpantelis added a commit to tpantelis/submariner that referenced this issue Oct 24, 2023
...rather than exiting.

Related to submariner-io#2747

Signed-off-by: Tom Pantelis <[email protected]>
tpantelis added a commit to tpantelis/submariner that referenced this issue Oct 24, 2023
...rather than exiting.

Related to submariner-io#2747

Signed-off-by: Tom Pantelis <[email protected]>
tpantelis added a commit to tpantelis/submariner that referenced this issue Oct 24, 2023
tpantelis added a commit to tpantelis/submariner that referenced this issue Oct 25, 2023
...rather than exiting.

Related to submariner-io#2747

Signed-off-by: Tom Pantelis <[email protected]>
tpantelis added a commit to tpantelis/submariner that referenced this issue Oct 25, 2023
tpantelis added a commit to tpantelis/submariner that referenced this issue Oct 25, 2023
tpantelis added a commit to tpantelis/submariner that referenced this issue Oct 25, 2023
...rather than exiting.

Related to submariner-io#2747

Signed-off-by: Tom Pantelis <[email protected]>
tpantelis added a commit to tpantelis/submariner that referenced this issue Oct 25, 2023
tpantelis added a commit to tpantelis/submariner that referenced this issue Oct 25, 2023
...rather than exiting.

Related to submariner-io#2747

Signed-off-by: Tom Pantelis <[email protected]>
tpantelis added a commit to tpantelis/submariner that referenced this issue Oct 25, 2023
@nyechiel nyechiel moved this from In Progress to In Review in Submariner 0.17 Oct 30, 2023
tpantelis added a commit that referenced this issue Oct 30, 2023
...rather than exiting.

Related to #2747

Signed-off-by: Tom Pantelis <[email protected]>
tpantelis added a commit that referenced this issue Oct 30, 2023
tpantelis added a commit to tpantelis/submariner that referenced this issue Oct 30, 2023
...rather than exiting.

Related to submariner-io#2747

Signed-off-by: Tom Pantelis <[email protected]>
tpantelis added a commit to tpantelis/submariner that referenced this issue Oct 30, 2023
tpantelis added a commit to tpantelis/submariner that referenced this issue Oct 30, 2023
...rather than exiting.

Related to submariner-io#2747

Signed-off-by: Tom Pantelis <[email protected]>
tpantelis added a commit to tpantelis/submariner that referenced this issue Oct 30, 2023
sridhargaddam pushed a commit that referenced this issue Oct 31, 2023
...rather than exiting.

Related to #2747

Signed-off-by: Tom Pantelis <[email protected]>
sridhargaddam pushed a commit that referenced this issue Oct 31, 2023
@github-project-automation github-project-automation bot moved this from In Review to Done in Submariner 0.17 Oct 31, 2023
novad03 added a commit to novad03/k8s-submariner that referenced this issue Nov 25, 2023
novad03 added a commit to novad03/k8s-submariner that referenced this issue Nov 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:high
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants