Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

saveHostStatus triggers even on unchanged object #1253

Closed
s3rj1k opened this issue Apr 17, 2023 · 19 comments · Fixed by #1881
Closed

saveHostStatus triggers even on unchanged object #1253

s3rj1k opened this issue Apr 17, 2023 · 19 comments · Fixed by #1881
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. needs-triage Indicates an issue lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@s3rj1k
Copy link
Member

s3rj1k commented Apr 17, 2023

func (r *BareMetalHostReconciler) saveHostStatus(host *metal3v1alpha1.BareMetalHost) error {
	t := metav1.Now()
	host.Status.LastUpdated = &t

	return r.Status().Update(context.TODO(), host)
}

There is an edge case when saveHostStatus is triggered even when no actual objects changes are made, the only change to object is updating LastUpdated filed.

It would be better to have this field updated only if something in status was actually changed.

Current behavior adds unneeded reconciles for external watchers that follow BMH changes, this becomes noticeable on big clusters with lots for BMH.

@metal3-io-bot metal3-io-bot added the needs-triage Indicates an issue lacks a `triage/foo` label and requires one. label Apr 17, 2023
@s3rj1k
Copy link
Member Author

s3rj1k commented Apr 17, 2023

@dtantsur What do you think?

@dtantsur
Copy link
Member

From a quick look at the source code, we do seem to only call saveHostStatus when the action returns a dirty flag. Maybe we sometimes do it incorrectly. Do you have specific examples when it happens @s3rj1k?

@Rozzii Rozzii added the triage/needs-information Indicates an issue needs more information in order to work on it. label May 24, 2023
@s3rj1k
Copy link
Member Author

s3rj1k commented Jun 21, 2023

@Rozzii What information is needed?

@s3rj1k
Copy link
Member Author

s3rj1k commented Jun 21, 2023

Do you have specific examples when it happens @s3rj1k?

It affected me noticeably during the pre-provision phase (in big cluster), when machines are getting linked with hosts, host where changing constantly.

@Rozzii
Copy link
Member

Rozzii commented Jun 21, 2023

@s3rj1k I would like to have steps to reproduce the issue, as far as I understand from your description this happens when there are a lot of new machines are getting registered but from that I can't tell what exactly is causing the issue.

@s3rj1k
Copy link
Member Author

s3rj1k commented Jul 24, 2023

@s3rj1k I would like to have steps to reproduce the issue, as far as I understand from your description this happens when there are a lot of new machines are getting registered but from that I can't tell what exactly is causing the issue.

I am not sure what extra steps you need to reproduce this, the issue of NOP updates is there all the time, just check the code, updates are unconditional, every reconcile (in not provisioned states) time is getting updated.

@Rozzii
Copy link
Member

Rozzii commented Aug 2, 2023

/triage accepted
/help

@metal3-io-bot metal3-io-bot added triage/accepted Indicates an issue is ready to be actively worked on. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed needs-triage Indicates an issue lacks a `triage/foo` label and requires one. labels Aug 2, 2023
@Rozzii
Copy link
Member

Rozzii commented Aug 2, 2023

I don't have experience or time to deal wit this issue but if it is visible from the code for someone with the experience then I am fine accepting this as a bug.

@metal3-io metal3-io deleted a comment from metal3-io-bot Aug 2, 2023
@metal3-io-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@metal3-io-bot metal3-io-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 31, 2023
@Rozzii
Copy link
Member

Rozzii commented Nov 1, 2023

I will go ahead and accept this, although I am not that familiar with this part of the code, the issue makes sense to me and no one else have reacted.
/accept
/remove-lifecycle stale
/lifecycle frozen

@metal3-io-bot metal3-io-bot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 1, 2023
@Rozzii Rozzii removed the triage/needs-information Indicates an issue needs more information in order to work on it. label Nov 1, 2023
@troy0820
Copy link
Contributor

/assign @troy0820

@zaneb
Copy link
Member

zaneb commented Aug 27, 2024

I am not sure what extra steps you need to reproduce this, the issue of NOP updates is there all the time, just check the code, updates are unconditional, every reconcile (in not provisioned states) time is getting updated.

This is referring to https://github.com/metal3-io/baremetal-operator/blob/release-0.3/controllers/metal3.io/baremetalhost_controller.go#L1538-L1541

However, I checked the code and as Dmitry said, we only call this if something else had changed: https://github.com/metal3-io/baremetal-operator/blob/release-0.3/controllers/metal3.io/baremetalhost_controller.go#L235-L245

Without even a single example of this NOP update happening - even a log would be helpful, we always log when we update the Status - there's no way for us to tell if any change has actually had any effect.

@zaneb zaneb added triage/needs-information Indicates an issue needs more information in order to work on it. and removed lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. labels Aug 27, 2024
@Rozzii Rozzii removed the triage/accepted Indicates an issue is ready to be actively worked on. label Sep 3, 2024
@metal3-io-bot
Copy link
Contributor

This issue is currently awaiting triage.
If Metal3.io contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.
The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@metal3-io-bot metal3-io-bot added the needs-triage Indicates an issue lacks a `triage/foo` label and requires one. label Sep 3, 2024
@Rozzii
Copy link
Member

Rozzii commented Sep 3, 2024

Both @dtantsur and @zaneb have unanswered questions around the same topic thus I have removed the "accepted" label until they receive the info they need to evaluate the issue.

@s3rj1k
Copy link
Member Author

s3rj1k commented Sep 3, 2024

@zaneb, @dtantsur #1253 (comment)

Do you agree that this specific function is updating status disregarding if the actual change is happening?
Meaning it does not check if in fact old host and new host are different.

@zaneb
Copy link
Member

zaneb commented Sep 4, 2024

@zaneb, @dtantsur #1253 (comment)

Do you agree that this specific function is updating status disregarding if the actual change is happening?

I don't disagree with that.
I disagree that anybody in this thread has identified an instance where that function is called without an actual change happening. If you find one I'd consider it a (minor) bug, but I don't know where to look.

@s3rj1k
Copy link
Member Author

s3rj1k commented Sep 4, 2024

I don't disagree with that.

Issue is exclusively about that piece if code.

If you find one I'd consider it a (minor) bug.

Feel free to close the issue.

@zaneb
Copy link
Member

zaneb commented Sep 4, 2024

OK, thanks. Feel free to open a new issue if you see writes in the logs that don't correspond to any change in the status.
/close

@metal3-io-bot
Copy link
Contributor

@zaneb: Closing this issue.

In response to this:

OK, thanks. Feel free to open a new issue if you see writes in the logs that don't correspond to any change in the status.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. needs-triage Indicates an issue lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants