WIP: ✨ Flexible Nova microversions #1567

lentzi90 · 2023-05-29T08:11:44Z

This is related to the proposal in #1565 and is just meant as a discussion starter for now. If you have any comments, ideas or suggestions, please feel free to comment here or in #1565.

What this PR does / why we need it:

Use a list of supported versions instead of a single hard coded one. The list can be filtered based on specific feature requirements (e.g. usage of server tags). The version to use is then picked from the intersection of this list and what the server supports.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1448

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

squashed commits
if necessary:
- includes documentation
- adds unit tests

/hold

k8s-ci-robot · 2023-05-29T08:11:45Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

netlify · 2023-05-29T08:11:50Z

✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name	Link
🔨 Latest commit	`3ab8a84`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/671b608060318d000834abd1
😎 Deploy Preview	https://deploy-preview-1567--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

jichenjc · 2023-05-29T23:49:31Z

pkg/cloud/services/compute/instance.go

+		for i := range recognizedMicroversions {
+			if recognizedMicroversions[i].ID == "2.1" {
+				continue
+			}


what's the 2.1 handling here? I Think we have a default (2.1) then choose a best fit (2.53, 2.60 ..)
just don't understand the special handle here..

Good question! I'm not sure if this is how we should do it, but I can explain what I'm trying to do here.
Before we set the minimum version we didn't set any version at all. If I understand correctly that would mean the same thing as setting 2.1.

If no version is specified then the API will behave as if a version request of v2.1 was requested.

From https://docs.openstack.org/nova/latest/reference/api-microversion-history.html

So what I'm trying to do here is to allow fallback to the default if possible. If tags are used, then we require 2.53 minimum so then I remove 2.1 from the versions that it can pick from. Probably a better way to do it would be to convert the versions to floats and then remove anything that is < 2.53, but for now I'm just playing with it.

Incidentally however we do it, I'd probably go with ints rather than floats and drop the 2..

I wonder if this could be something like:

compute := s.getComputeClient() if len(instanceSpec.Tags) > 0 { err := compute.RequireMicroVersion(NovaTagging) if err != nil { ...tagging not supported... } }

We could have similar for neutron extensions.

Thoughts:

how can we inform the user up front, if their configuration requires more than their cloud supports?

If we can't do it up front, how do we make the best UX we can in the failure case?

How do we do the 'we need the multi-attach microversion' thing, because we can't detect that from k8s objects?

On the latter point: we could pre-fetch the specified volume-type, or infer the default volume type (don't know if this is possible) and check if multi-attach is set. Or we could react to a failure by trying again with a higher microversion, but the granularity of OpenStack error code likely makes this messy.

how can we inform the user up front, if their configuration requires more than their cloud supports?

I think upfront is impossible here. We don't do any API calls in the webhook and for good reason, so I don't see how we would be able to do it unfortunately.

If we can't do it up front, how do we make the best UX we can in the failure case?

Probably the best we can do here is to bubble up the error to the CAPI level where it will be visible on the Machine or the Cluster.

How do we do the 'we need the multi-attach microversion' thing, because we can't detect that from k8s objects?
On the latter point: we could pre-fetch the specified volume-type, or infer the default volume type (don't know if this is possible) and check if multi-attach is set. Or we could react to a failure by trying again with a higher microversion, but the granularity of OpenStack error code likely makes this messy.

Now we are getting to the core of the issue! I think the reasonable thing to do is to keep a list of supported versions with higher priority for higher versions (as seen in this PR). That way the highest supported version would be picked (I hope), which should support as many features as possible. My thinking is that this would solve the issue without complicating the code with extra checks, errors and retries.

However, it builds on the assumption that picking the highest supported version is always best and I'm not completely sure if this is true. Could there be a case similar to the multi-attach volume situation, where OpenStack is configured in a way that a microversion is in theory supported, but in practice a lower version is needed because of specific-volume-type-or-similar set up by the admins? I want to think the answer is no and that it would be safe to always pick the highest available versions.

jichenjc · 2023-05-30T00:01:57Z

pkg/clients/compute.go

+var NovaSupportedVersions = []*utils.Version{
+	{ID: "2.1", Priority: 10, Suffix: "/v2.1/"},   // Initial microversion release, same as specifying no version
+	{ID: "2.53", Priority: 20, Suffix: "/v2.53/"}, // Maximum in Pike
+	{ID: "2.60", Priority: 30, Suffix: "/v2.60/"}, // Maximum in Queens


we forced to 2.60 at commit 2df3778
so I think at some specific call we should use that version ,but I didn't see we have such negotiation
in the following code, can you point me where's the logic that choose 2.60 after the change?Thanks

The version is chosen here. It will check what versions are supported by the server and then pick a version from this list that is supported. I also made sure to put highest priority for 2.60 so this is what should be preferred if it is available.

dulek · 2023-05-30T17:00:31Z

pkg/clients/compute.go

 CAPO supports multiattach volume types, which were added in microversion 2.60.
 */
-const NovaMinimumMicroversion = "2.60"
+var NovaSupportedVersions = []*utils.Version{
+	{ID: "2.1", Priority: 10, Suffix: "/v2.1/"},   // Initial microversion release, same as specifying no version


I'd put the bar higher here, it's hard to correlate but we might be implicitly depending on some features only available later. Starting from Newton ¹ would make sense to me.

Footnotes

https://docs.openstack.org/nova/latest/reference/api-microversion-history.html#maximum-in-newton ↩

Fair point! I picked 2.1 because that seems to be what is used if no version is specified, so this would have been the situation before CAPO specified a version. Basically this list then represents the history of CAPO:

No version (=2.1)

2.53 (added as requirement for tags)

2.60 (added as requirement for multi-attach volumes)

That said, I agree that it could be a good idea to set the bar higher and 2.38 sounds like a good starting place.

Newton is old enough , we should be good to set the bar to Newton
but I'd like to make it configuration variable so someone might change it if they really want

but I'd like to make it configuration variable so someone might change it if they really want

This was also my original suggestion. I wanted to make CAPO/Gophercloud pick up the microversion from clouds.yaml. This way the user could set the version in there if needed and it would work the same way as for the OpenStack CLI. However, there were objections since it would be very easy to break things by setting versions that CAPO cannot work with. So now I'm trying to propose a way where CAPO still has control but it is a bit more flexible.

lentzi90 · 2023-06-14T09:44:46Z

Updated! I added 2 more commits: one to "undo" the previous changes so it is easier to compare different approaches, and one with new changes. What I have done is basically this:

Use integers instead of strings and floats when comparing versions
Use a separate RequireMicroversion method instead of passing the requirements to getComputeClient. I'm a bit torn on this. It is nice to not have to pass the required version around, but it also makes it easier to forget to set any requirements. I still added a version negotiation without specific requirements to the NewComputeClient function to avoid this "no version set" issue.
Set a higher lowest version. As suggested I put the bar at 2.38 now instead of 2.1. I still think there is a valid reason to go with 2.1 (since that is the default when nothing is specified), but I also think it is so old that we should not have to support it.

pkg/clients/compute.go

mdbooth · 2023-06-19T15:45:03Z

pkg/clients/compute.go

+	if err != nil {
+		return nil, fmt.Errorf("failed to negotiation compute service version: %w", err)
+	}
+	compute.Microversion = version.ID


This method needs to return a deterministic microversion. It can't be a negotiated version: it needs to be success or failure. I want my code to run with microversion X or nothing (ok, we're going to have to support 2 for volume creation/attachment, but... details).

How about:

NewComputeClient always returns microversion 38.

A new method WithMicroversion(microversion int) (ComputeClient, error) returns a copy of itself with the target microversion if it is supported?

So most code is unchanged, and the only effect is that it's now running against an older microversion.

Code which requires a newer microversion does something like:

{ // Lexical scope limits new compute client compute, err := compute.WithMicroversion(TAGS) if err ... }

Hmm I'm not sure I understand. If we have deterministic microversion and no negotiation, then how would this return failure? Then we end up with the same as we already have where CAPO says the microversion is X and then we will see when making the first API call if that works or not. With negotiation we get to know this already here.
I can accept negotiating with a single microversion (so it is that microversion or nothing), but then I still think it is better to do the negotiation than just setting it without checking if it is supported.

I anticipate that we'd fetch the maximum supported microversion from Nova when initially creating the service client and store it in the service client. We would then use this every time we needed to check that the required microversion was supported. i.e.

On instantiating the service client we check that microversion 38 is supported because that's the minimum. This is the microversion that all code without a higher requirement will run against.

When adding tags to a server before creation we'd do something like:

createClient, err := compute.WithMicroversion(TAGS) if err != nil { createClient = compute } else { ... add tags to server create opts ... }

mdbooth

I think I can see a way forward with this!

lentzi90 · 2023-09-29T10:05:01Z

/test pull-cluster-api-provider-openstack-e2e-test

lentzi90 · 2023-09-29T10:52:07Z

/test pull-cluster-api-provider-openstack-e2e-test

lentzi90 · 2023-09-29T12:22:48Z

/test pull-cluster-api-provider-openstack-e2e-test

lentzi90 · 2023-10-04T11:26:50Z

/test pull-cluster-api-provider-openstack-e2e-test
/test pull-cluster-api-provider-openstack-test
/test pull-cluster-api-provider-openstack-build

lentzi90 · 2023-10-04T12:47:59Z

It probably makes sense to move some of this to gophercloud. See gophercloud/gophercloud#2791

lentzi90 · 2023-10-04T12:48:20Z

/test pull-cluster-api-provider-openstack-build
/test pull-cluster-api-provider-openstack-test

k8s-triage-robot · 2024-01-26T04:22:47Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

lentzi90 · 2024-01-30T12:50:57Z

/test pull-cluster-api-provider-openstack-build
/test pull-cluster-api-provider-openstack-test
/test pull-cluster-api-provider-openstack-e2e-test

k8s-triage-robot · 2024-03-29T09:48:02Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

lentzi90 · 2024-04-02T06:57:04Z

/remove-lifecycle rotten

k8s-triage-robot · 2024-07-01T07:46:31Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

lentzi90 · 2024-07-07T05:45:06Z

/remove-lifecycle stale

k8s-triage-robot · 2024-10-05T06:43:21Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

lentzi90 · 2024-10-24T08:26:48Z

/remove-lifecycle stale

k8s-ci-robot · 2024-10-25T06:53:43Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from lentzi90. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

lentzi90 · 2024-10-25T08:04:55Z

/test pull-cluster-api-provider-openstack-build
/test pull-cluster-api-provider-openstack-test
/test pull-cluster-api-provider-openstack-e2e-test

This is an attempt to set the microversion based on what features are needed. For example, if tags are defined for the server, then we need a microversion that supports tags. If no special features are used/needed then we use the fixed minimum version

lentzi90 · 2024-10-25T09:38:48Z

/test pull-cluster-api-provider-openstack-build
/test pull-cluster-api-provider-openstack-test

lentzi90 · 2024-10-25T10:09:17Z

/test pull-cluster-api-provider-openstack-e2e-test

k8s-ci-robot requested review from dulek and jichenjc May 29, 2023 08:11

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 29, 2023

jichenjc reviewed May 30, 2023

View reviewed changes

dulek reviewed May 30, 2023

View reviewed changes

lentzi90 force-pushed the lentzi90/flexible-nova-version branch from c38aa7f to cd74f5a Compare June 14, 2023 09:35

mdbooth reviewed Jun 19, 2023

View reviewed changes

pkg/clients/compute.go Outdated Show resolved Hide resolved

mdbooth reviewed Jun 19, 2023

View reviewed changes

lentzi90 force-pushed the lentzi90/flexible-nova-version branch from cd74f5a to ab8d3bc Compare September 29, 2023 09:49

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 29, 2023

lentzi90 force-pushed the lentzi90/flexible-nova-version branch from ab8d3bc to 178d4d0 Compare September 29, 2023 09:54

lentzi90 force-pushed the lentzi90/flexible-nova-version branch from 178d4d0 to 746cf67 Compare September 29, 2023 12:22

mdbooth mentioned this pull request Oct 4, 2023

CAPO should set InternalDNS in status.Addresses #1689

Closed

lentzi90 force-pushed the lentzi90/flexible-nova-version branch from 7433bf2 to bc97d3e Compare October 4, 2023 12:42

lentzi90 force-pushed the lentzi90/flexible-nova-version branch from bc97d3e to f931f03 Compare October 5, 2023 06:08

lentzi90 mentioned this pull request Oct 25, 2023

Add microversion utilities gophercloud/gophercloud#2791

Merged

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 28, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 26, 2024

lentzi90 force-pushed the lentzi90/flexible-nova-version branch from 21df213 to b584752 Compare January 30, 2024 09:28

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 30, 2024

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 28, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 29, 2024

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 2, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 1, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 7, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 5, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 24, 2024

lentzi90 force-pushed the lentzi90/flexible-nova-version branch from b584752 to 05bbd05 Compare October 25, 2024 06:53

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 25, 2024

k8s-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 25, 2024

lentzi90 force-pushed the lentzi90/flexible-nova-version branch from 05bbd05 to 3ab8a84 Compare October 25, 2024 09:10

WIP: ✨ Flexible Nova microversions #1567

Are you sure you want to change the base?

WIP: ✨ Flexible Nova microversions #1567

Conversation

lentzi90 commented May 29, 2023 • edited Loading

k8s-ci-robot commented May 29, 2023

netlify bot commented May 29, 2023 • edited Loading

✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Footnotes

Choose a reason for hiding this comment

jichenjc May 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lentzi90 commented Jun 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdbooth Jun 20, 2023 • edited Loading

Choose a reason for hiding this comment

mdbooth left a comment

Choose a reason for hiding this comment

lentzi90 commented Sep 29, 2023

lentzi90 commented Sep 29, 2023

lentzi90 commented Sep 29, 2023

lentzi90 commented Oct 4, 2023

lentzi90 commented Oct 4, 2023

lentzi90 commented Oct 4, 2023

k8s-triage-robot commented Jan 26, 2024

lentzi90 commented Jan 30, 2024

k8s-triage-robot commented Mar 29, 2024

lentzi90 commented Apr 2, 2024

k8s-triage-robot commented Jul 1, 2024

lentzi90 commented Jul 7, 2024

k8s-triage-robot commented Oct 5, 2024

lentzi90 commented Oct 24, 2024

k8s-ci-robot commented Oct 25, 2024

lentzi90 commented Oct 25, 2024

lentzi90 commented Oct 25, 2024

lentzi90 commented Oct 25, 2024

lentzi90 commented May 29, 2023 •

edited

Loading

netlify bot commented May 29, 2023 •

edited

Loading

jichenjc May 31, 2023 •

edited

Loading

mdbooth Jun 20, 2023 •

edited

Loading