Fix: Do not emit health_status event for each health check attempt. #24005

harish2704 · 2024-09-18T17:24:09Z

Emit health_status event only if there is a change in health_status
Fixes #24003

Does this PR introduce a user-facing change?

Emit `health_status` event only when there a status/state change for `healthy` flag

openshift-ci · 2024-09-18T17:24:18Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: harish2704
Once this PR has been reviewed and has the lgtm label, please assign baude for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

packit-as-a-service · 2024-09-18T17:54:10Z

Cockpit tests failed for commit 2771216d55d7fec0e86642e774fdc73f59d4a409. @martinpitt, @jelly, @mvollmer please check.

martinpitt · 2024-09-19T03:18:30Z

This "breaks" cockpit-podman's testHealthcheckSystem, which assumes the current behavior of getting regular "health check ran and passed" events. I don't have a good gut feeling whether the regular "pings of life" are expected/by design or considered noise. So I'll defer to the review of the podman developers, and if this change is approved, I'll adjust cockpit-podman's tests.

Luap99

I don't have a good gut feeling whether the regular "pings of life" are expected/by design or considered noise

I don't think there is a formal design on how it should work but the way it works today is how the current users will expect it to work. The fact that it breaks cockpit testing is a good sign that we have users depending on it. As such I consider this a breaking change that is not suitable for a minor version so this would have to wait for podman 6.0 if we want to do that at all. I think reducing the event spam is a good idea in general.

Now one thing we should consider if docker doesn't behave this way our docker compat api should not behave this way either. One way would be to a new field to the health_status event that is set when the status changed and then we can filter out the events that did not have this set to make the docker clients work correctly at least.

cc @mheon @Honny1 In case you have opinions as you have been working on other healthcheck events work

libpod/healthcheck.go

Honny1

Why not, reduce the noise in the logs, but I would wait for Podman 6.0. I'd also be in favor of adding a flag to enable log "saving" for each run, for podman healtcheck run (probably also for podman run), for debugging purposes or if something goes wrong.

harish2704 · 2024-09-19T13:39:10Z

@Luap99

I don't think there is a formal design on how it should work but the way it works today is how the current users will expect it to work.

I have to agree on this because I searched for such a documentation ( https://docs.docker.com/reference/api/engine/version/v1.41/#tag/System/operation/SystemEvents ) and there is no description about exact behavior of those events . They have only provided the list of valid events

The fact that it breaks cockpit testing is a good sign that we have users depending on it. As such I consider this a breaking change that is not suitable for a minor version so this would have to wait for podman 6.0 if we want to do that at all

IMHO, Podman has a huge opportunity as secure docker replacement and in that sense most of the users ( and applications targeting docker eg: Traefik ) will expect it work similar to docker. To address the user base who is looking to switch to Podman, it should be considered as a bug. Please note that, here i am not referring the podman cli tool. I am talking about docker compatible API which podman provides.

Now one thing we should consider if docker doesn't behave this way our docker compat api should not behave this way either. One way would be to a new field to the health_status event that is set when the status changed and then we can filter out the events that did not have this set to make the docker clients work correctly at least.

Exactly. This is the point I was trying to convey. I checked how podman-docker works but it is found to be a simple bash wrapper script for podman cli ( at-least in my Fedora-40 ) .
So, If my above understanding is correct ( ie, podman-docker is just an alias/wrapper of podman ) then, podman-docker is not exposing a separate docker compatible API ( either via unix socket or via TCP ) .
All we have is an API exposed by podman and it is expected to be docker compatible.

So, in short, my bug report is not about behavior of podma-cli tool. it is about the docker-compatible API which podman exposes.
if any of you can provide some hints on how to fix this issue without changing behaviour podman cli I am happy to try that. To be specific, my question is , at which point, Golang event is converted to HTTP API data?
I will try to add the additional flag as you mentioned in the comment

mheon · 2024-09-19T14:38:49Z

IMO, this should not be default. I would not mind adding a config field in containers.conf to enable this more minimal events output but we should not (and, as @Luap99 pointed out, cannot without a major version) do this by default.

Luap99 · 2024-09-19T16:54:10Z

@harish2704 Our API socket is split in two parts the normal docker api endpoint and the our libpod endpoints that all start with /version/libpod/... so all the other docker compatiable endpoints can and should be changed to match docker api as closely as possible.

Look into pkg/api/handlers/compat/events.go there we use the code for both endpoints but if you look there into the logic you will find !utils.IsLibpodRequest(r) usage there so based on that you can change the behavior there. Now of course because the event stream cannot known when such a healthcheck state change happens you need to add this info into the event itself so that you can filter it there based on that which is why I suggest added this field or attribute to the event type.

packit-as-a-service · 2024-09-19T18:12:38Z

Cockpit tests failed for commit 5b8d32d26bc1eebc45d82b40e9866a0459a6a500. @martinpitt, @jelly, @mvollmer please check.

packit-as-a-service · 2024-09-20T09:35:20Z

Cockpit tests failed for commit 21b6aa68c0f65c4385fad0941c08fa14e9de8dfb. @martinpitt, @jelly, @mvollmer please check.

Emit event only if there is a change in health_status Fixes containers#24003 Resolves containers#24005 (comment) Pass additional isChanged flag to event creation function Fix health check events for docker api Signed-off-by: Harish Karumuthil <[email protected]>

harish2704 · 2024-09-20T19:20:14Z

@Luap99
I made the changes as you suggested. In short, they are

Added extra flag to to Event struct to indicate whether health status changed or not
Marshal & parse that flag while writing to Journald log
Check that flag before writing/publishing an even in docker compatible API. Other APIs are not touched.

ie, if we run podman events we can see the old normal behavior of too many logs
but if we run env DOCKER_HOST=unix://<path-to-podman.sock> docker logs then we can not see to much logs for health check. Logs are emitted only when there is a state change

Signed-off-by: Harish Karumuthil <[email protected]>

packit-as-a-service · 2024-09-20T19:38:05Z

Cockpit tests failed for commit 3ec4a744dbd9f95e5667b242f58ec110dc4049bc. @martinpitt, @jelly, @mvollmer please check.

packit-as-a-service · 2024-09-20T19:43:36Z

Cockpit tests failed for commit cecdca7. @martinpitt, @jelly, @mvollmer please check.

packit-as-a-service · 2024-09-20T19:44:53Z

Cockpit tests failed for commit 4597408. @martinpitt, @jelly, @mvollmer please check.

Luap99

This will need some API level tests to ensure it working, have a look at test/apiv2

Luap99 · 2024-09-23T11:21:35Z

libpod/events/config.go

+	// Whether state change happened for HealthStatus or not
+	IsHealthStatusChanged bool


This needs to be at least omitempty like other fields so it doesn't show up on non hc events. Maybe it would be better to add this into into the Attributes map instead, @mheon WDYT?

By reading the code, I understand that attribute map is fully allocated to storing container's labels
That is why I decided not to choose that. ( Initially I thought to store this in attributes map )

@Luap99 , @mheon : Please let me know what is the correct place to store these data. I will rebase, make changes & test my changes accordingly

Luap99 · 2024-09-23T11:25:25Z

libpod/events.go

-func (c *Container) newContainerHealthCheckEvent(healthStatus string) {
-	if err := c.newContainerEventWithInspectData(events.HealthStatus, healthStatus, false); err != nil {
+func (c *Container) newContainerHealthCheckEvent(healthStatus string, isHcStatusChanged bool) {
+	if err := c.newContainerEventWithInspectData(events.HealthStatus, events.EventMetadata{HealthStatus: healthStatus, IsHealthStatusChanged: isHcStatusChanged}, false); err != nil {


This will cause conflicts with #23900, not sure what is best here and which one should be merged first.

I would merge #23900 first. I think this PR will need more care.

Luap99 · 2024-09-23T11:26:23Z

pkg/api/handlers/compat/events.go

@@ -93,6 +93,10 @@ func GetEvents(w http.ResponseWriter, r *http.Request) {
 			if evt == nil {
 				continue
 			}
+			if evt.Status == events.HealthStatus && !evt.IsHealthStatusChanged{


This needs a !utils.IsLibpodRequest(r) check like the other docker compat changes below as this function is used for both the docker and libpod endpoint.

openshift-merge-robot · 2024-09-29T23:28:57Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

github-actions · 2024-10-31T00:09:08Z

A friendly reminder that this PR had no activity for 30 days.

openshift-ci bot added the release-note label Sep 18, 2024

Luap99 requested changes Sep 19, 2024

View reviewed changes

libpod/healthcheck.go Outdated Show resolved Hide resolved

Honny1 reviewed Sep 19, 2024

View reviewed changes

github-actions bot added the kind/api-change Change to remote API; merits scrutiny label Sep 20, 2024

harish2704 force-pushed the fix-reduce-duplicate-health-check-events branch from 21b6aa6 to 3ec4a74 Compare September 20, 2024 19:11

harish2704 force-pushed the fix-reduce-duplicate-health-check-events branch from 3ec4a74 to cecdca7 Compare September 20, 2024 19:12

Fix Go fmt

4597408

Signed-off-by: Harish Karumuthil <[email protected]>

Luap99 reviewed Sep 23, 2024

View reviewed changes

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 29, 2024

github-actions bot added the stale-pr label Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Do not emit health_status event for each health check attempt. #24005

Fix: Do not emit health_status event for each health check attempt. #24005

harish2704 commented Sep 18, 2024

openshift-ci bot commented Sep 18, 2024

packit-as-a-service bot commented Sep 18, 2024

martinpitt commented Sep 19, 2024

Luap99 left a comment

Honny1 left a comment

harish2704 commented Sep 19, 2024

mheon commented Sep 19, 2024

Luap99 commented Sep 19, 2024

packit-as-a-service bot commented Sep 19, 2024

packit-as-a-service bot commented Sep 20, 2024

harish2704 commented Sep 20, 2024

packit-as-a-service bot commented Sep 20, 2024

packit-as-a-service bot commented Sep 20, 2024

packit-as-a-service bot commented Sep 20, 2024

Luap99 left a comment

Luap99 Sep 23, 2024

harish2704 Sep 23, 2024

harish2704 Sep 30, 2024

Luap99 Sep 23, 2024

Honny1 Sep 23, 2024

Luap99 Sep 23, 2024

openshift-merge-robot commented Sep 29, 2024

github-actions bot commented Oct 31, 2024

		// Whether state change happened for HealthStatus or not
		IsHealthStatusChanged bool

Fix: Do not emit health_status event for each health check attempt. #24005

Are you sure you want to change the base?

Fix: Do not emit health_status event for each health check attempt. #24005

Conversation

harish2704 commented Sep 18, 2024

Does this PR introduce a user-facing change?

openshift-ci bot commented Sep 18, 2024

packit-as-a-service bot commented Sep 18, 2024

martinpitt commented Sep 19, 2024

Luap99 left a comment

Choose a reason for hiding this comment

Honny1 left a comment

Choose a reason for hiding this comment

harish2704 commented Sep 19, 2024

mheon commented Sep 19, 2024

Luap99 commented Sep 19, 2024

packit-as-a-service bot commented Sep 19, 2024

packit-as-a-service bot commented Sep 20, 2024

harish2704 commented Sep 20, 2024

packit-as-a-service bot commented Sep 20, 2024

packit-as-a-service bot commented Sep 20, 2024

packit-as-a-service bot commented Sep 20, 2024

Luap99 left a comment

Choose a reason for hiding this comment

Luap99 Sep 23, 2024

Choose a reason for hiding this comment

harish2704 Sep 23, 2024

Choose a reason for hiding this comment

harish2704 Sep 30, 2024

Choose a reason for hiding this comment

Luap99 Sep 23, 2024

Choose a reason for hiding this comment

Honny1 Sep 23, 2024

Choose a reason for hiding this comment

Luap99 Sep 23, 2024

Choose a reason for hiding this comment

openshift-merge-robot commented Sep 29, 2024

github-actions bot commented Oct 31, 2024