openshift · openshift-ci · Nov 1, 2023 · Sep 21, 2023 · Sep 25, 2023 · Sep 26, 2023
diff --git a/enhancements/machine-config/pin-and-pre-load-images.md b/enhancements/machine-config/pin-and-pre-load-images.md
@@ -0,0 +1,232 @@
+---
+title: pin-and-pre-load-images
+authors:
+- "@jhernand"
+reviewers:
+- "@avishayt"
+- "@danielerez"
+- "@mrunalp"
+- "@nmagnezi"
+- "@oourfali"
+approvers:
+- "@sdodson"
+- "@zaneb"
+- "@LalatenduMohanty"
+api-approvers:
+- "@sdodson"
+- "@zaneb"
+- "@deads2k"
+- "@JoelSpeed"
+creation-date: 2023-09-21
+last-updated: 2023-09-21
+tracking-link:
+- https://issues.redhat.com/browse/RFE-4482
+see-also:
+- https://github.com/openshift/enhancements/pull/1432
+- https://github.com/openshift/machine-config-operator/pull/3839
+replaces: []
+superseded-by: []
+---
+
+# Pin and pre-load images
+
+## Summary
+
+Provide an mechanism to pin and pre-load container images.
+
+## Motivation
+
+Slow and/or unreliable connections to the image registry servers interfere with
+operations that require pulling images. For example, an upgrade may require
+pulling more than one hundred images. Failures to pull those images cause
+retries that interfere with the upgrade process and may eventually make it
+fail. One way to improve that is to pull the images in advance, before they are
+actually needed, and ensure that they aren't removed.
+
+### User Stories
+
+#### Pre-load and pin upgrade images
+
+As the administrator of a cluster that has a low bandwidth and/or unreliable
+connection to an image registry server I want to pin and pre-load all the
+images required for the upgrade in advance, so that when I decide to actually
+perform the upgrade there will be no need to contact that slow and/or
+unreliable registry server and the upgrade will successfully complete in a
+predictable time.
+
+#### Pre-load and pin application images
+
+As the administrator of a cluster that has a low bandwidth and/or unreliable
+connection to an image registry server I want to pin and pre-load the images
+required by my application in advance, so that when I decide to actually deploy
+it there will be no need to contact that slow and/or unreliable registry server
+and my application will successfully deploy in a predictable time.
+
+### Goals
+
+Provide a mechanism that cluster administrators can use to pin and pre-load
+container images.
+
+### Non-Goals
+
+None.
+
+## Proposal
+
+### Workflow Description
+
+1. The administrator of a cluster uses the `ContainerRuntimeConfig` object to
+request that a set of container images are pinned and pre-loaded:
+
+    ```yaml
+    apiVersion: machineconfiguration.openshift.io/v1
+    kind: ContainerRuntimeConfig
+    metadata:
+      name: ...
+    spec:
+      containerRuntimeConfig:
+        pinnedImages:
+        - quay.io/openshift-release-dev/ocp-release@sha256:...
+        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:...
+        - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:...
+        ...
+    ```
+
+1. The machine config operators ensures that all the images are pinned and
+pulled in all the nodes of the cluster.
+
+### API Extensions
+
+There are no new object kinds introduced by this enhancement, but new fields
+will be added to existing `ContainerRuntimeConfig` objects.
+
+The new fields for the `ContainerRuntimeConfig` object are defined in detail in
+https://github.com/openshift/machine-config-operator/pull/3839.
+
+### Implementation Details/Notes/Constraints
+
+Starting with version 4.14 of OpenShift CRI-O will have the capability to pin
+certain images (see [this](https://github.com/cri-o/cri-o/pull/6862) pull
+request for details). That capability will be used to pin all the images
+required for the upgrade, so that they aren't garbage collected by kubelet and
+CRI-O.
+
+The changes to pin the images will be done in a `/etc/crio/crio.conf.d/pin.conf`
+file, something like this:
+
+```toml
+pinned_images=[
+  "quay.io/openshift-release-dev/ocp-release@sha256:...",
+  "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:...",
+  "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:...",
+  ...
+]
+```
+
+The images need to be pre-loaded and the CRI-O service needs to be reloaded
+when this configuration changes. To support that a new field will be added to
+the `ContainerRuntimeConfig` object:
+
+```yaml
+apiVersion: machineconfiguration.openshift.io/v1
+kind: ContainerRuntimeConfig
+metadata:
+  name: ...
+spec:
+  containerRuntimeConfig:
+    pinnedImages:
+    - quay.io/openshift-release-dev/ocp-release@sha256:...
+    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:...
+    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:...
+    ...
+```
+
+When the new `pinnedImages` field is added or changed the machine config
+operator will need to pull those images (with the equivalent of `crictl pull`),
+create or update the corresponding `/etc/crio/crio.conf.d/pin.conf` file and ask
+CRI-O reload its configuration (with the equivalent of `systemctl reload
+crio.service`).
+
+The machine config operator will then will use the gRPC API of CRI-O to run the
+equivalent of `crictl pull` for each of the images. When that is completed the
+machine config operator will update the new `status.pinnedImages` field of the
+rendered machine config:
+
+```yaml
+status:
+  pinnedImages:
+  - quay.io/openshift-release-dev/ocp-release@sha256:...
+  - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:...
+  - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:...
+  ...
+```
+
+### Risks and Mitigations
+
+None.
+
+### Drawbacks
+
+This approach requires non trivial changes to the machine config operator.
+
+## Design Details
+
+### Open Questions
+
+None.
+
+### Test Plan
+
+We add a CI test that verifies that images are correctly pinned and pre-loaded.
+
+### Graduation Criteria
+
+The feature will ideally be introduced as `Dev Preview` in OpenShift 4.X,
+moved to `Tech Preview` in 4.X+1 and declared `GA` in 4.X+2.
+
+#### Dev Preview -> Tech Preview
+
+- Availability of the CI test.
+
+- Obtain positive feedback from at least one customer.
+
+#### Tech Preview -> GA
+
+- User facing documentation created in
+[https://github.com/openshift/openshift-docs](openshift-docs).
+
+#### Removing a deprecated feature
+
+Not applicable, no feature will be removed.
+
+### Upgrade / Downgrade Strategy
+
+Not applicable.
+
+### Version Skew Strategy
+
+Not applicable.
+
+### Operational Aspects of API Extensions
+
+Not applicable, there are no API extensions.
+
+#### Failure Modes
+
+#### Support Procedures
+
+## Implementation History
+
+There is an initial prototype exploring some of the implementation details
+described here in this [https://github.com/jhernand/upgrade-tool](repository).
+
+## Alternatives
+
+The alternative to this is to manually pull the images in all the nodes of the
+cluster, manually create the `/etc/crio/crio.conf.d/pin.conf` file and manually
+reload the CRI-O service.
+
+## Infrastructure Needed
+
+Infrastructure will be needed to run the CI test described in the test plan
+above.