OPNET-629: Mark haproxy unhealthy if no healthy backends #4767

cybertron · 2024-12-18T22:24:47Z

Previously we avoided doing this because of potential issues in unhealthy clusters where backends were flapping and we didn't want to trigger failovers. However, given the nature of the firewall rule monitor check that approach was not effective anyway and allowing HAProxy to report its own status to the monitor is much more robust than relying on API calls being routed correctly when API rollouts are happening.

This is being implemented as a separate monitor endpoint because we don't want the Kubelet liveness probes to fail just because there are no backends (which is an expected state in early cluster deployment). That would trigger unnecessary crash loops.

- What I did

- How to verify it

- Description for the changelog

Previously we avoided doing this because of potential issues in unhealthy clusters where backends were flapping and we didn't want to trigger failovers. However, given the nature of the firewall rule monitor check that approach was not effective anyway and allowing HAProxy to report its own status to the monitor is much more robust than relying on API calls being routed correctly when API rollouts are happening. This is being implemented as a separate monitor endpoint because we don't want the Kubelet liveness probes to fail just because there are no backends (which is an expected state in early cluster deployment). That would trigger unnecessary crash loops.

openshift-ci-robot · 2024-12-18T22:24:51Z

@cybertron: This pull request references OPNET-629 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

Previously we avoided doing this because of potential issues in unhealthy clusters where backends were flapping and we didn't want to trigger failovers. However, given the nature of the firewall rule monitor check that approach was not effective anyway and allowing HAProxy to report its own status to the monitor is much more robust than relying on API calls being routed correctly when API rollouts are happening.

This is being implemented as a separate monitor endpoint because we don't want the Kubelet liveness probes to fail just because there are no backends (which is an expected state in early cluster deployment). That would trigger unnecessary crash loops.

- What I did

- How to verify it

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

This is the runtimecfg change corresponding to openshift/machine-config-operator#4767 which switches the monitor call to the HAProxy endpoing rather than call through to the API.

openshift-ci · 2024-12-18T22:26:00Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cybertron
Once this PR has been reviewed and has the lgtm label, please assign djoshy for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

In order to improve the robustness of the on-prem HAProxy instance, we have added a second healthcheck port in openshift/machine-config-operator#4767 This corresponds to the existing 9444 port, but because the surrounding ports were already in use I moved it an even 10 away.

openshift-ci · 2024-12-19T02:45:32Z

@cybertron: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-azure-ovn-upgrade-out-of-change	`e7cca24`	link	false	`/test e2e-azure-ovn-upgrade-out-of-change`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 18, 2024

openshift-ci bot requested review from jcpowermac and patrickdillon December 18, 2024 22:25

cybertron mentioned this pull request Dec 18, 2024

OPNET-629: Use HAProxy monitor endpoint instead of API openshift/baremetal-runtimecfg#336

Open

cybertron mentioned this pull request Dec 18, 2024

OPNET-629: Add second HAProxy healthcheck port openshift/enhancements#1728

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OPNET-629: Mark haproxy unhealthy if no healthy backends #4767

OPNET-629: Mark haproxy unhealthy if no healthy backends #4767

cybertron commented Dec 18, 2024

openshift-ci-robot commented Dec 18, 2024 •

edited by openshift-ci bot

Loading

openshift-ci bot commented Dec 18, 2024

openshift-ci bot commented Dec 19, 2024

OPNET-629: Mark haproxy unhealthy if no healthy backends #4767

Are you sure you want to change the base?

OPNET-629: Mark haproxy unhealthy if no healthy backends #4767

Conversation

cybertron commented Dec 18, 2024

openshift-ci-robot commented Dec 18, 2024 • edited by openshift-ci bot Loading

openshift-ci bot commented Dec 18, 2024

openshift-ci bot commented Dec 19, 2024

openshift-ci-robot commented Dec 18, 2024 •

edited by openshift-ci bot

Loading