-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhancements/machine-config: add updates to PinnedImageSet #1599
Conversation
Signed-off-by: Sam Batschelet <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, the implementation makes sense to me and will not result in any major outward facing MCO changes! thanks for updating this
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: cdoern The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Sam Batschelet <[email protected]>
afe184b
to
142664d
Compare
@hexfusion: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
`certificate_writer`. This new feature will operate independently of the | ||
`MachineConfig` for setting configurations. It is advantageous to not use | ||
`MachineConfig` because the configuration and image prefetching can happen | ||
across the nodes in the pool in parallel. The new controller will be watching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we are introducing PinnedImageSetController
, which new controller
we are referring to here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pinned_image_manager
will clarify if its not MCD itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, it was a typo. I meant to ask, Now that we are no longer adding PinnedImageSetController , which new controller we are referring to here?
`pinned_image_manager` will: | ||
|
||
1. Begin by marking the node with an annotation to indicate the manager is | ||
`Working` utilizing the `nodeWriter`, similar to the `MachineConfigDaemon` `update` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: For better troubleshooting, In MCO implementation, wherever possible we would want to add some logging to make sure MCP status update is happening via pinned_image_manager.
selectors in the `PinnedImageSet` custom resources. | ||
Once all the relevant images are successfully pinned and downloaded to the | ||
matching nodes, the `pinned_image_manager` will signal the completion of the | ||
process by invoking nodeWriter.SetDone(). This action notifies the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am overthinking. Calling nodeWriter.SetDone() would mean that we are creating two sources where anode is marked as update completed, so we need a way to ensure that we don't do that while MCD is also performing an update due to rendered-config change. Probably, this isn't an issue for targeted usecase but making this a general feature would require better thought process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will reword there is just a single sync of MCD so Done
will only be applied by MCD. I was trying to convey that no additional mechanisms were required. But how that will work was not described will flesh that out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a similar question here, I guess we're sort of introducing an inter-dependency between the "machineconfig update" and the "pinnedimageset update" which, reading the controller code, shouldn't conflict unless there is a degrade. But this does sort of raise some clarifications such as:
- do we want one's failure to block the other
- if you roll out one pinnedimageset and then try to roll out a second one (i.e. append another set to the list in the MCP) should we be waiting for the first to finish
- if the daemon is working on a node, should it not reboot the node unless pinnedimageset is able to finish
- does the
done
state otherwise affect node updates - how does the pinnedimageset controller actually know if the
done
state is reacting to the current set of pinnedimageset changes vs, say, the daemon is dead and hasn't reacted to the latest request yet?
@@ -17,7 +17,7 @@ api-approvers: | |||
- "@deads2k" | |||
- "@JoelSpeed" | |||
creation-date: 2023-09-21 | |||
last-updated: 2023-09-21 | |||
last-updated: 2023-03-15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wrong year ;)
... | ||
``` | ||
The PinnedImageSet is closely linked with the `MachineConfigPool`, and each | ||
Custom Resource (CR) can be associated with a pool at the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you can reference the same pinnedimageset across different pools?
those is created, updated or deleted the controller will start a daemon set that | ||
will do the following in each node of the cluster: | ||
The _machine-config-daemon_ will grow a new `pinned_image_manager` utilizing | ||
the same general flow as the existing `MAchineConfigDaemon` `certificate_writer`. This approach is not dependent on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: MAchineConfigDaemon
-> MachineConfigDaemon
selectors in the `PinnedImageSet` custom resources. | ||
Once all the relevant images are successfully pinned and downloaded to the | ||
matching nodes, the `pinned_image_manager` will signal the completion of the | ||
process by invoking nodeWriter.SetDone(). This action notifies the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a similar question here, I guess we're sort of introducing an inter-dependency between the "machineconfig update" and the "pinnedimageset update" which, reading the controller code, shouldn't conflict unless there is a degrade. But this does sort of raise some clarifications such as:
- do we want one's failure to block the other
- if you roll out one pinnedimageset and then try to roll out a second one (i.e. append another set to the list in the MCP) should we be waiting for the first to finish
- if the daemon is working on a node, should it not reboot the node unless pinnedimageset is able to finish
- does the
done
state otherwise affect node updates - how does the pinnedimageset controller actually know if the
done
state is reacting to the current set of pinnedimageset changes vs, say, the daemon is dead and hasn't reacted to the latest request yet?
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
/close will open new PR with updates |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This PR is a follow-up to #1481 adding changes to the planned
v1alpha1
API andTechPreview
4.16 implementation .