Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRAYSAT-1767: Set cray-sat version in csm-config #2962

Merged
merged 5 commits into from
Oct 26, 2023

Conversation

haasken-hpe
Copy link
Contributor

Summary and Scope

Add an initContainer definition to the values defined for the csm-config Helm chart. This initContainer sets the cray-sat image tag to match the version included in the CSM build, so that they are always consistent. The Ansible content in csm-config sets this cray-sat container image tag in the file /opt/cray/etc/sat/version, so that the sat-podman wrapper script knows which version of the image to run.

Add logic to release.sh that inspects the container image tags specified in docker/index.yaml and injects those tags for the cray-sat container image and the Alpine Linux container image into the csm-config Helm chart values in the Loftsman manifest.

Modify lib/setup-nexus.sh to set the cray-sat container image version and tag the latest one as csm-latest. We are phasing out reliance on this csm-latest tag, but it is still the default used by the sat-podman wrapper script if the file /opt/cray/etc/sat/version does not exist. Modify lib/setup-nexus.sh to also write the tag of the cray-sat image to /opt/cray/etc/sat/version as well. This is only executed on ncn-m001, but that is sufficient for bootstrapping. When the CSM layer of the CFS configuration is applied, this file will be set on all master and worker management nodes.

Issues and Related PRs

Testing

Tested on:

  • mug

Test description:

I've done various manual tests of each piece of this.

I have tested that shipping a similar Loftsman manifest on mug does result in the correct content being uploaded to the csm-config-management VCS repository. I have tested the individual yq commands and sed commands added to the release.sh script separately outside of their full context.

Risks and Mitigations

This adds an initContainer to the csm-config Helm chart, and if that fails, the import of the csm-config-management repo to VCS will fail.

It's also possible I don't understand the build process fully, and I may have done something incorrectly that won't work due to the ordering of build steps. This should be reviewed carefully.

Pull Request Checklist

  • Version number(s) incremented, if applicable
  • Copyrights updated
  • License file intact
  • Target branch correct
  • Testing is appropriate and complete, if applicable
  • HPC Product Announcement prepared, if applicable

@haasken-hpe
Copy link
Contributor Author

I didn't realize I didn't get any automatic reviewers. Added reviewers.

  • @rbak-hpe to review the init container definition mostly
  • @mtupitsyn to review the changes regarding query of container images from the docker index and other build concerns
  • @heemstra to review the install changes to lib/setup-nexus.sh mostly
  • @mharding-hpe for awareness

docker/index.yaml Outdated Show resolved Hide resolved
docker/index.yaml Outdated Show resolved Hide resolved
@haasken-hpe
Copy link
Contributor Author

I realized last week that this is going to require one more change in order to work correctly.

Today, the SAT product stream uploads the cray-sat container image to the image registry in Nexus as cray/cray-sat. CSM uploads the cray-sat container image to the image registry in Nexus as artifactory.algol60.net/sat-docker/stable/cray-sat.

Formerly only the SAT product stream used to deliver the Ansible that caused the file /opt/cray/etc/sat/version to be created. Thus, if this file exists, the sat wrapper script provided by cray-sat-podman assumes that the image in Nexus is cray/cray-sat, which is where it is delivered by the separate SAT product stream. Now that the CSM product delivers the Ansible content that populates the file /opt/cray/etc/sat/version, we need to modify the cray-sat-podman wrapper script logic to always use the CSM-provided path in the image registry in Nexus.

This will create a backwards compatibility issue. If a newer version of cray-sat-podman is installed that has this changed logic, it will no longer be able to find the cray-sat images delivered by older versions of SAT. I think this is okay because people should not need to roll back to older cray-sat versions, and if they really need to, it would be easy enough to do by modifying the sat wrapper shell script or by moving images in Nexus.

Here are the steps I'm going to take to handle this:

  • Change the sat builds to publish to csm-docker instead of sat-docker. The way CSM uploads container images to Nexus, it keeps the Artifactory path in the image names. I eventually planned to move SAT container images to publish to csm-docker in CRAYSAT-1771. Since I'm going to be changing the code in sat-podman that looks up the image in Nexus anyway, I might as well make the change to the image publishing location now, so I don't have to make two backwards incompatible changes one after the other. I will probably bump the minor version in cray-sat when I do this because if I need to patch the version of the cray-sat image included in CSM 1.5, I'm going to want to have room to bump just the patch version.
  • Change the logic in the sat-podman wrapper script to only look for the cray-sat container image in the new csm-docker path in artifactory. This will probably be a major version bump, since it's incompatible with older versions of SAT.

Until I complete those changes, this should not be merged.

Add an initContainer definition to the values defined for the
`csm-config` Helm chart. This initContainer sets the cray-sat image tag
to match the version included in the CSM build, so that they are always
consistent. The Ansible content in csm-config sets this cray-sat
container image tag in the file `/opt/cray/etc/sat/version`, so that the
sat-podman wrapper script knows which version of the image to run.

Add logic to `release.sh` that inspects the container image tags
specified in `docker/index.yaml` and injects those tags for the cray-sat
container image and the Alpine Linux container image into the csm-config
Helm chart values in the Loftsman manifest.

Modify `lib/setup-nexus.sh` to set the `cray-sat` container image
version and tag the latest one as `csm-latest`. We are phasing out
reliance on this `csm-latest` tag, but it is still the default used
by the sat-podman wrapper script if the file `/opt/cray/etc/sat/version`
does not exist. Modify `lib/setup-nexus.sh` to also write the tag of the
`cray-sat` image to `/opt/cray/etc/sat/version` as well. This is only
executed on `ncn-m001`, but that is sufficient for bootstrapping. When
the CSM layer of the CFS configuration is applied, this file will be set
on all master and worker management nodes.

Test Description:
Only tested one piece at a time so far.
Update `lib/setup-nexus.sh` to ensure the directory `/opt/cray/etc/sat`
exists before writing to the `version` file in that directory. On an
upgrade, this directory should already exist because it will have been
created by management node personalization performed by CFS. However, on
a fresh install, this script is executed on the PIT, and this directory
will not exist. In fact, in that case, `sat` is not even installed on
the PIT, so writing to this file won't affect anything, but it also
won't hurt anything.
Use the rolling tag of `alpine:3` instead of `alpine:3.18` since we do
not depend on any particular functionality specific to a version of the
`alpine` container in its usage in the `csm-config` Helm chart's
initContainer.
Update the `sysmgmt.yaml` Loftsman manifest to specify the latest 1.17.0
version of the `csm-config` Helm chart. This is the newly published
release which includes the `csm.ncn.sat` role which is being modified by
the `initContainer` which is being used here.
The cray-sat image now publishes to csm-docker, and this is where the
newer 3.0.0 version of cray-sat-podman expects it to be uploaded to
Nexus on the system. Make the following changes:

* Download cray-sat from csm-docker. This also has the side effect of
  uploading to the same path in Nexus upon installation
* Change code in `lib/setup-nexus.sh` to tag the image as `csm-latest`
  in its new location in Nexus
* Modify the `yq` command in `release.sh` that reads the `cray-sat`
  container image version from `docker/index.yaml`
* Remove harmless but now unnecessary condition on `sat-docker` path in
  `build/images/inspect.sh`

Test Description:
Tested the `yq` command independently to ensure it could still get the
version of `cray-sat`.
@haasken-hpe haasken-hpe force-pushed the CRAYSAT-1767-customize-sat-version branch from 8b1bc04 to 446d10b Compare October 25, 2023 21:10
@haasken-hpe
Copy link
Contributor Author

The actions from my previous comment here have been completed. Here are the PRs, which have been merged:

There is still a PR which should be merged in metal-provision, but it should be timed to merge with this change, so both changes appear in the same CSM 1.6.0 build. Here is that PR:

@heemstra heemstra merged commit 89b1ec6 into release/1.6 Oct 26, 2023
2 checks passed
@heemstra heemstra deleted the CRAYSAT-1767-customize-sat-version branch October 26, 2023 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants