Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPNET-629: Mark haproxy unhealthy if no healthy backends #4767

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cybertron
Copy link
Member

Previously we avoided doing this because of potential issues in unhealthy clusters where backends were flapping and we didn't want to trigger failovers. However, given the nature of the firewall rule monitor check that approach was not effective anyway and allowing HAProxy to report its own status to the monitor is much more robust than relying on API calls being routed correctly when API rollouts are happening.

This is being implemented as a separate monitor endpoint because we don't want the Kubelet liveness probes to fail just because there are no backends (which is an expected state in early cluster deployment). That would trigger unnecessary crash loops.

- What I did

- How to verify it

- Description for the changelog

Previously we avoided doing this because of potential issues in
unhealthy clusters where backends were flapping and we didn't want
to trigger failovers. However, given the nature of the firewall rule
monitor check that approach was not effective anyway and allowing
HAProxy to report its own status to the monitor is much more robust
than relying on API calls being routed correctly when API rollouts
are happening.

This is being implemented as a separate monitor endpoint because we
don't want the Kubelet liveness probes to fail just because there
are no backends (which is an expected state in early cluster
deployment). That would trigger unnecessary crash loops.
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 18, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Dec 18, 2024

@cybertron: This pull request references OPNET-629 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

Previously we avoided doing this because of potential issues in unhealthy clusters where backends were flapping and we didn't want to trigger failovers. However, given the nature of the firewall rule monitor check that approach was not effective anyway and allowing HAProxy to report its own status to the monitor is much more robust than relying on API calls being routed correctly when API rollouts are happening.

This is being implemented as a separate monitor endpoint because we don't want the Kubelet liveness probes to fail just because there are no backends (which is an expected state in early cluster deployment). That would trigger unnecessary crash loops.

- What I did

- How to verify it

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

cybertron added a commit to cybertron/baremetal-runtimecfg that referenced this pull request Dec 18, 2024
This is the runtimecfg change corresponding to
openshift/machine-config-operator#4767
which switches the monitor call to the HAProxy endpoing rather than
call through to the API.
Copy link
Contributor

openshift-ci bot commented Dec 18, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cybertron
Once this PR has been reviewed and has the lgtm label, please assign djoshy for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

cybertron added a commit to cybertron/enhancements that referenced this pull request Dec 18, 2024
In order to improve the robustness of the on-prem HAProxy instance,
we have added a second healthcheck port in
openshift/machine-config-operator#4767
This corresponds to the existing 9444 port, but because the
surrounding ports were already in use I moved it an even 10 away.
Copy link
Contributor

openshift-ci bot commented Dec 19, 2024

@cybertron: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-ovn-upgrade-out-of-change e7cca24 link false /test e2e-azure-ovn-upgrade-out-of-change

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants