Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASMCMS-8758: Update RPM repos so that CFS and image recipes will have the packages they require #2680

Merged
merged 3 commits into from
Aug 10, 2023

Conversation

mharding-hpe
Copy link
Contributor

@mharding-hpe mharding-hpe commented Aug 9, 2023

Update!

Read before merging!

In the original version of this PR, in order to avoid breaking recipes this late in the game, the PR did not remove any RPMs from any repos. If an RPM has changed from being tied to a SLE version to being noos, then the PR added it to the noos repo, and also updated the SLE repos where it is found to include the new noos build of that RPM. The CSM 1.6 version of this PR cleans that up, so that noos RPMs are only in the noos repository.

However, Rusty felt very strongly that we should not add the noos RPMs into the SLE repos. It would not cause any technical problems, but he prefers not to mix them (although it should be noted that we do exactly this kind of mixing in the CSM embedded repository). Because he is adamant about it, and because we need the changes in this PR for CSM 1.5, I have relented and added a commit which removes those RPMs from the SLE repositories, so that they only appear in the noos repo. I did this in a separate commit, so that if @denniswalker or the powers that be prefer to go the more cautious path, I can easily go back to the original version of the PR.

None of this discussion affects the CSM 1.6 backport of this PR, or the metal-provision PRs. It is purely a consideration for this CSM 1.5 PR.

Summary and Scope

This PR updates some RPM versions, adds some RPMs to the Nexus repos, and moves some RPMs around in those repos. This is motivated by the following:

  • csm-config changed in CSM 1.5 so that it only sets up the Zypper repos for the appropriate SLES version where it is running (in addition to the noos repository). This means that for CFS, the RPMs it wants to update can no longer be in a repo for a different SLES release.
  • Some RPMs have changed to being built as noos. But any image recipes which install those RPMs have to specify the repository where they can be found. We have seen problems in the past where recipes stopped working when an RPM was moved to a different repo.
  • Some RPM versions in the csm repo were behind the version of the corresponding RPMs in the metal-provision repo, which is undesirable. The versions in csm should always be >= the versions in metal-provision.
  • Some RPMs are now built both x86 and aarch64,. Now that we are adding support for aarch64 Compute nodes and images, relevant aarch64 RPMs should be uploaded into Nexus
  • A few entries in the RPM manifests had superfluous ".rpm" strings at the end -- this PR cleans those up to match the rest

The version bumps in this PR are just to move to the version of the RPM with the updated build specs (noos, noarch, etc) -- they are not content changes.

This also bumps the csm-config version to one with updated RPM lists for different node types.

Issues and Related PRs

@mharding-hpe mharding-hpe changed the title Update RPM repos so that CFS and image recipes will have the packages they require CASMCMS-8758: Update RPM repos so that CFS and image recipes will have the packages they require Aug 9, 2023
Copy link
Contributor

@rustydb rustydb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't clear what the mix-n-matching is fixing, we shouldn't be adding noos repos to the sle specific files. csm-config for 1.5 and 1.6 is configured to add sle-{current distro service pack}, noos, and csm-embedded. Other than metal-provision and csm-config, what other recipes need to know?

@mharding-hpe
Copy link
Contributor Author

mharding-hpe commented Aug 9, 2023

It isn't clear what the mix-n-matching is fixing, we shouldn't be adding noos repos to the sle specific files. csm-config for 1.5 and 1.6 is configured to add sle-{current distro service pack}, noos, and csm-embedded. Other than metal-provision and csm-config, what other recipes need to know?

I'd have to defer to @dlaine-hpe , as he is the one who informed us that we broke some recipes when a different RPM was moved to a different repo. But I don't see this being a big deal to have this intermediate step between moving them fully to noos for CSM 1.6. It's definitely the less risky move.

@rustydb
Copy link
Contributor

rustydb commented Aug 9, 2023

Did a partial review, but I am still unsure about the the mix-n-match repos.

@mtupitsyn
Copy link
Contributor

@mharding-hpe You can use the following code to filter out packages, which do not to be under rpm/cray/*, because they are already in embedded repo:

$ git checkout -b CASMCMS-8758-1.5 -t origin/CASMCMS-8758-1.5

# Requires ARTIFACTORY_USER and ARTIFACTORY_TOKEN env variables.
# Validates RPM indexes and generates file named rpm/embedded/index.yaml
$ ./hack/embedded-repo.sh --validate
Downloading package lists ...
List of packages for embedded repo:
    Mesa-22.3.5-150500.75.2
    Mesa-dri-22.3.5-150500.75.2
    Mesa-gallium-22.3.5-150500.75.2

# List packages which can be removed from rpm/cray/*
$ for p in $(grep -r -E '^ +- .*' -h rpm/cray | sed -e 's/^ *- //' | sort -u); do echo "$embedded" | grep $p; done
cfs-debugger-1.4.0-1.noarch
cfs-state-reporter-1.9.3-1.noarch
cfs-trust-1.6.7-1.noarch
cm-cli-1.5.0-1.x86_64
cray-site-init-1.32.0-1.x86_64
csm-auth-utils-1.0.0-1.noarch
csm-node-heartbeat-2.1-3.x86_64
cvt-1.5.9-20230804061353_2d3e92970639.x86_64
hpe-csm-goss-package-0.3.21-hpe3.x86_64
hpe-yq-4.33.3-1.x86_64
ilorest-4.2.0.0-20.x86_64
loftsman-1.2.0-3.x86_64
metal-basecamp-1.2.6-1.x86_64
metal-ipxe-2.4.4-1.noarch
metal-nexus-1.2.3-1.x86_64
metal-observability-1.0.9-1.x86_64
pit-init-1.4.2-1.noarch
platform-utils-1.6.3-1.noarch
smart-mon-1.0.3-1.noarch
spire-agent-1.5.5-1.8.x86_64

@mharding-hpe
Copy link
Contributor Author

mharding-hpe commented Aug 9, 2023

Did a partial review, but I am still unsure about the the mix-n-match repos.

@rustydb It seems like the clearly less risky course to me, because it just removes possible problems. And CSM 1.6 doesn't have it, so it's temporary. I am erring on the side of caution with the changes. If you wish to make a follow-up PR to prune the RPM list, I won't oppose you. I just don't feel confident that removing the RPMs is safe.

@mharding-hpe
Copy link
Contributor Author

mharding-hpe commented Aug 9, 2023

@mharding-hpe You can use the following code to filter out packages, which do not to be under rpm/cray/*, because they are already in embedded repo:

@mtupitsyn Per Rusty, some packages need to be in there even though they are in the embedded repo. I have already modified my PR so that I am not adding any NEW RPMs that are not currently in these repos. I consider it beyond the purview of this PR to prune the list of RPMs that are already in there. I think that you and Rusty should make a follow-up PR to prune the list, but this PR is big enough already.

@mtupitsyn
Copy link
Contributor

mtupitsyn commented Aug 9, 2023

Why would we need aarch64 packages? We only ship x86_64 NCN images, so they are of no use for NCNs. Without them, PR will be much smaller.

@mharding-hpe
Copy link
Contributor Author

mharding-hpe commented Aug 9, 2023

Why would we need aarch64 packages? We only ship x86_64 NCN images, so they are of no use for NCNs. Without them, PR will be much smaller.

@mtupitsyn I only included aarch64 packages for RPMs that we install on compute nodes (or compute node images), as I believe we are adding aarch64 support for compute nodes in CSM 1.5. We added support for them to IMS for this release. That's why many of our RPMs are now built both x86_64 and aarch64. These same repos are used by CFS when installing packages on compute nodes and compute images.

If I'm mistaken about the aarch64 support in this release, or if plans have changed and I wasn't aware of it, then we can take them out.

UPDATE I've discussed with the CMS folks who are heading up the aarch64 initiative and have confirmed that for any RPMs which are installed on compute nodes, if they are not noarch, then we need to have both x86_64 and aarch64 versions available in Nexus.

@rustydb
Copy link
Contributor

rustydb commented Aug 9, 2023

Did a partial review, but I am still unsure about the the mix-n-match repos.

@rustydb It seems like the clearly less risky course to me, because it just removes possible problems. And CSM 1.6 doesn't have it, so it's temporary. I am erring on the side of caution with the changes. If you wish to make a follow-up PR to prune the RPM list, I won't oppose you. I just don't feel confident that removing the RPMs is safe.

Still, do not mix-n-match the repos. Sure it might seem harmless, but so would just making a single index.yml file and copying/symlinking it in all the directories. It isn't about being clean, we really shouldn't mask any issues here given Noos is a 1.5 deliverable and 1.5 isn't out until November.. we have time to point folks at noos.

EDIT: Pete and I were clinically going through these packages for the noos conversion, between docs-csm, csm-config, and metal-provision everything is set currently to look at noos. Outside of CSM we obviously have no control over, and we can either publish those packages to the distros to accommodate or accept the risk and get the team(s) to switch. Since Pete is gone, Mikhail and I are now tag-teaming this. We shouldn't pretend repos have things they don't actually have in artifactory.

@mharding-hpe
Copy link
Contributor Author

mharding-hpe commented Aug 10, 2023

Did a partial review, but I am still unsure about the the mix-n-match repos.

@rustydb It seems like the clearly less risky course to me, because it just removes possible problems. And CSM 1.6 doesn't have it, so it's temporary. I am erring on the side of caution with the changes. If you wish to make a follow-up PR to prune the RPM list, I won't oppose you. I just don't feel confident that removing the RPMs is safe.

Still, do not mix-n-match the repos. Sure it might seem harmless, but so would just making a single index.yml file and copying/symlinking it in all the directories. It isn't about being clean, we really shouldn't mask any issues here given Noos is a 1.5 deliverable and 1.5 isn't out until November.. we have time to point folks at noos.

EDIT: Pete and I were clinically going through these packages for the noos conversion, between docs-csm, csm-config, and metal-provision everything is set currently to look at noos. Outside of CSM we obviously have no control over, and we can either publish those packages to the distros to accommodate or accept the risk and get the team(s) to switch. Since Pete is gone, Mikhail and I are now tag-teaming this. We shouldn't pretend repos have things they don't actually have in artifactory.

I've added a commit with the changes you requested, and I've added a comment at the top of the PR description with an explanation of it. I'll leave it up to Dennis whether he finds the risk acceptable or not.

I would be more inclined to agree with you if I was proposing doing this going forward, instead of just for CSM 1.5. Given that the CSM 1.6 PR doesn't have these issues, I'm not sure why it is such a big deal that CSM 1.5 would be in an intermediate stepping-stone state.

The mix-and-match repo is not unprecedented, as evidenced by the embedded repo. The CSM embedded repo contains both noos and SLE RPMs, sourced from multiple different artifactory repos. So we clearly don't have a philosophical objection to having RPMs from different artifactory repos or with heterogeneous OS types in the same Nexus repo (and RPM repos themselves allow for this as a matter of course -- it's not a "wrong" use of RPM repos).

I'm not sure why you are conflating the Nexus repos with artifactory. We do pull RPMs from artifactory to populate the repos, but I don't think there's anything to suggest that there's a 1-to-1 relationship between repos in Nexus and repos in artifactory. For example, look at the noos repo defined here -- it pulls from two different artifactory sources. And if anyone wanted to know where an RPM in a Nexus repo came from, they'd just go look at the manifest file in here, and it would be obvious where in artifactory it originated.

But like I said, I added a commit with your requested changes, and Dennis/whoever can decide what they're comfortable with. Since it's a separate commit, it's easy to remove if Dennis wants to go the other way.

@mharding-hpe mharding-hpe requested review from rustydb and removed request for rustydb August 10, 2023 16:51
mharding-hpe and others added 3 commits August 10, 2023 13:09
… to noos versions of some RPMs; add aarch64 RPMs for appropriate packages
Remove noos RPMs from SLE repos that were added as a precaution

Co-authored-by: Russell Bunch <[email protected]>
@mharding-hpe mharding-hpe force-pushed the CASMCMS-8758-1.5 branch 3 times, most recently from 1cc6402 to c839985 Compare August 10, 2023 18:52
@gbaker-hpe
Copy link
Contributor

Rusty requested changes which we just discussed on a call. He's good moving this forward.

@gbaker-hpe gbaker-hpe merged commit 77603cd into release/1.5 Aug 10, 2023
1 check passed
@gbaker-hpe gbaker-hpe deleted the CASMCMS-8758-1.5 branch August 10, 2023 19:53
@rustydb
Copy link
Contributor

rustydb commented Aug 10, 2023

Did a partial review, but I am still unsure about the the mix-n-match repos.

@rustydb It seems like the clearly less risky course to me, because it just removes possible problems. And CSM 1.6 doesn't have it, so it's temporary. I am erring on the side of caution with the changes. If you wish to make a follow-up PR to prune the RPM list, I won't oppose you. I just don't feel confident that removing the RPMs is safe.

Still, do not mix-n-match the repos. Sure it might seem harmless, but so would just making a single index.yml file and copying/symlinking it in all the directories. It isn't about being clean, we really shouldn't mask any issues here given Noos is a 1.5 deliverable and 1.5 isn't out until November.. we have time to point folks at noos.
EDIT: Pete and I were clinically going through these packages for the noos conversion, between docs-csm, csm-config, and metal-provision everything is set currently to look at noos. Outside of CSM we obviously have no control over, and we can either publish those packages to the distros to accommodate or accept the risk and get the team(s) to switch. Since Pete is gone, Mikhail and I are now tag-teaming this. We shouldn't pretend repos have things they don't actually have in artifactory.

I've added a commit with the changes you requested, and I've added a comment at the top of the PR description with an explanation of it. I'll leave it up to Dennis whether he finds the risk acceptable or not.

Thank you.
On the contrary, I thought Dennis may have had a hard time accepting what was going on with the giant diff. As thoughtful as we are, we're just speculating.

I would be more inclined to agree with you if I was proposing doing this going forward, instead of just for CSM 1.5. Given that the CSM 1.6 PR doesn't have these issues, I'm not sure why it is such a big deal that CSM 1.5 would be in an intermediate stepping-stone state.

Again, noos is a deliverable for 1.5. The driving folks for this (Pete, myself, Mikhail, and partially Dennis) have take a stepping stone step of having packages continue to publish to ^sle- repos in addition to the noos. Those were each discussed and impleneted after conferring with one another. Since the concern didn't come to light this week, and I didn't explicitly bring it up, I (in hindsight) falsely judged the notions by the CMS team (or just for you and David). In hindsight, to circumvent having this entire thread we should've done the stepping stone thing we already were doing and just publish to both.

The mix-and-match repo is not unprecedented, as evidenced by the embedded repo. The CSM embedded repo contains both noos and SLE RPMs, sourced from multiple different artifactory repos. So we clearly don't have a philosophical objection to having RPMs from different artifactory repos or with heterogeneous OS types in the same Nexus repo (and RPM repos themselves allow for this as a matter of course -- it's not a "wrong" use of RPM repos).

No, I coined mix-and-match/mix-n-match as a reference to pulling an entirely different repo URL into another for these YAML files. The CSM embedded repo is not defined within these YAML files, it has no manifests, it is besides the point. Using it as an example to justify making these files weird is a too much.

I'm not sure why you are conflating the Nexus repos with artifactory. We do pull RPMs from artifactory to populate the repos, but I don't think there's anything to suggest that there's a 1-to-1 relationship between repos in Nexus and repos in artifactory. For example, look at the noos repo defined here -- it pulls from two different artifactory sources. And if anyone wanted to know where an RPM in a Nexus repo came from, they'd just go look at the manifest file in here, and it would be obvious where in artifactory it originated.

I'm sorry if you misunderstood, or interpreted it that way. My words seem very twisted at this point. There's no conflation going on.

If I have a noos package in Artifactory that I want to update the CSM manifests for, it is counter-intuitive to update the non-noos index.yml files. Some people may grep and figure it out, but not everyone does.

But like I said, I added a commit with your requested changes, and Dennis/whoever can decide what they're comfortable with. Since it's a separate commit, it's easy to remove if Dennis wants to go the other way.

Much appreciated. Apologies on the misunderstandings, I did not mean to involve CMS even remotely close to this much in the noos manifest changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants