Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1alpha4 -> v1beta1 clusterctl upgrade test #1810

Merged

Conversation

shysank
Copy link
Contributor

@shysank shysank commented Oct 29, 2021

What type of PR is this?
/kind other

What this PR does / why we need it:
Adds v1alpha4 -> v1beta1 upgrade test
Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1809

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

NONE

@k8s-ci-robot
Copy link
Contributor

@shysank: The label(s) kind/other cannot be applied, because the repository doesn't have them.

In response to this:

What type of PR is this?
/kind other

What this PR does / why we need it:
Adds v1alpha4 -> v1beta1 upgrade test
Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1809

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 29, 2021
@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Oct 29, 2021
@shysank
Copy link
Contributor Author

shysank commented Oct 29, 2021

/test pull-cluster-api-provider-azure-capi-e2e

go.mod Outdated
sigs.k8s.io/cluster-api v1.0.0
sigs.k8s.io/cluster-api/test v1.0.0
sigs.k8s.io/controller-runtime v0.10.2
sigs.k8s.io/cluster-api v0.0.0-20211028170527-10d89ceca938
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

until this is in a CAPI release

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 29, 2021
@CecileRobertMichon
Copy link
Contributor

the pull CAPI job has gotten really long recently with all the new test specs + clusterctl upgrade. Wondering if we should breakout api upgrade into its own job so it runs in parallel, or if there is any way of increasing parallelism on the existing jobs. wdyt?

@shysank
Copy link
Contributor Author

shysank commented Oct 29, 2021

the pull CAPI job has gotten really long recently with all the new test specs + clusterctl upgrade. Wondering if we should breakout api upgrade into its own job so it runs in parallel, or if there is any way of increasing parallelism on the existing jobs. wdyt?

We could make it parallel by using ginkgo's parallelism, don't think we are doing it now. I see we set GINKGO_NODES but I think we should also add -p to enable parallelism. But, I'm not sure how the environment variables would work in that case. For the longer term, we should try to make tests not rely on any global state, and then enable parallelism. Looking at the times, the total time is ~440 minutes (7hrs 20mins), it does look like tests are run in parallel. Maybe, increase the number of nodes and try? But, the easier short term fix would be to just have another job, as you've pointed out. I'd even go as far to split them in half to save more time. It probably won't make sense to look at something like capi_e2e_part1 and capi_e2e_part2. Maybe we can overcome that by doing some intelligent grouping or having some kind of description that says what tests are being run.

@CecileRobertMichon
Copy link
Contributor

I'm pretty sure we do use parallelism, if you look at a recent run eg. https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-azure/1810/pull-cluster-api-provider-azure-capi-e2e/1454146115451490304, the total duration was 2h48, with 14 tests that ran at around ~30 minutes each (rough average), if they were running serially that would have taken 7 hours. The fact that we're running them on 3 Ginkgo nodes makes sense since that's roughly 3x the actual duration.

@shysank
Copy link
Contributor Author

shysank commented Oct 29, 2021

I'm pretty sure we do use parallelism, if you look at a recent run eg. https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-azure/1810/pull-cluster-api-provider-azure-capi-e2e/1454146115451490304, the total duration was 2h48, with 14 tests that ran at around ~30 minutes each (rough average), if they were running serially that would have taken 7 hours. The fact that we're running them on 3 Ginkgo nodes makes sense since that's roughly 3x the actual duration.

yeah just saw that :) updated my comment

@CecileRobertMichon
Copy link
Contributor

race condition :)

let's try to bump the number of nodes and see if it does anything good/bad? we might be limited on the prow side of things

@shysank shysank force-pushed the v1alpha4_v1beta1_upgrade_test branch from e349d04 to 73c13e6 Compare November 15, 2021 23:34
@shysank
Copy link
Contributor Author

shysank commented Nov 15, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Nov 16, 2021

KCP upgrade in HA cluster failed, unrelated to this pr, v1alpha4 -> v1beta1 upgrade test passed though. Trying again.
/test pull-cluster-api-provider-azure-capi-e2e

@shysank
Copy link
Contributor Author

shysank commented Nov 16, 2021

/hold cancel
@CecileRobertMichon This pr is ready now with capi upgraded to v1.0.1, ptal whenever you get a chance.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 16, 2021
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Test started yesterday at 9:35 PM passed after 2h31m1s.

did you get anywhere with #1816? 2.5 hours is really long for a PR test

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 16, 2021
@shysank
Copy link
Contributor Author

shysank commented Nov 16, 2021

did you get anywhere with #1816? 2.5 hours is really long for a PR test

I tried a bunch of things. There is some weird things happening with ginkgo nodes which I can't find the root cause yet. I'll keep trying whenever I get time, but meanwhile, wdyt think about splitting the jobs as you had suggested earlier?

@CecileRobertMichon
Copy link
Contributor

+1 to splitting the clusterctl upgrade tests into a separate job

@CecileRobertMichon
Copy link
Contributor

We could also remove the KCP upgrade specs from the CAPI job and run them as part of k8s upgrade jobs once #1735 merges, see kubernetes-sigs/cluster-api#4896 (comment)

@@ -257,5 +258,54 @@ var _ = Describe("Running the Cluster API E2E tests", func() {
}
})
})

Context("upgrade from v1alpha4 to v1beta1, and scale workload clusters created in v1alpha4", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a tag to the context so it's easier to filter these out as part of #1869? Something like [API-Version-Upgrade]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for the other clusterctl upgrade spec that's already merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, I've created a context API Version Upgrade that embeds the 2 upgrade tests. Is that okay? Happy to switch it to the way you've suggested if that's the convention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that should work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you going to open a PR to add it to Ginkgo skip and add a separate job?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptal at kubernetes/test-infra#24392 whenever you get a chance

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 16, 2021
@shysank
Copy link
Contributor Author

shysank commented Nov 17, 2021

/test pull-cluster-api-provider-azure-capi-e2e

@CecileRobertMichon
Copy link
Contributor

/test ls

@k8s-ci-robot
Copy link
Contributor

@CecileRobertMichon: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-cluster-api-provider-azure-build
  • /test pull-cluster-api-provider-azure-e2e
  • /test pull-cluster-api-provider-azure-e2e-windows-dockershim
  • /test pull-cluster-api-provider-azure-test
  • /test pull-cluster-api-provider-azure-verify

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-provider-azure-apidiff
  • /test pull-cluster-api-provider-azure-apiversion-upgrade
  • /test pull-cluster-api-provider-azure-capi-e2e
  • /test pull-cluster-api-provider-azure-ci-entrypoint
  • /test pull-cluster-api-provider-azure-conformance
  • /test pull-cluster-api-provider-azure-conformance-with-ci-artifacts
  • /test pull-cluster-api-provider-azure-coverage
  • /test pull-cluster-api-provider-azure-e2e-exp
  • /test pull-cluster-api-provider-azure-e2e-full
  • /test pull-cluster-api-provider-azure-e2e-workload-upgrade-1-21-1-22
  • /test pull-cluster-api-provider-azure-e2e-workload-upgrade-1-22-latest-main
  • /test pull-cluster-api-provider-azure-upstream-windows-dockershim
  • /test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts
  • /test pull-cluster-api-provider-azure-windows-dockershim-upstream-with-ci-artifacts

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-provider-azure-apidiff
  • pull-cluster-api-provider-azure-build
  • pull-cluster-api-provider-azure-ci-entrypoint
  • pull-cluster-api-provider-azure-coverage
  • pull-cluster-api-provider-azure-e2e
  • pull-cluster-api-provider-azure-e2e-exp
  • pull-cluster-api-provider-azure-e2e-windows-dockershim
  • pull-cluster-api-provider-azure-test
  • pull-cluster-api-provider-azure-verify

In response to this:

/test ls

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@CecileRobertMichon
Copy link
Contributor

/test pull-cluster-api-provider-azure-apiversion-upgrade
/test pull-cluster-api-provider-azure-capi-e2e

@k8s-ci-robot
Copy link
Contributor

@shysank: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-capi-e2e ee7a6ed link false /test pull-cluster-api-provider-azure-capi-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@CecileRobertMichon
Copy link
Contributor

/lgtm
/approve

KCP upgrade spec failed but the expected tests ran

@CecileRobertMichon
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 17, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 17, 2021
@CecileRobertMichon
Copy link
Contributor

/retest

@k8s-ci-robot k8s-ci-robot merged commit 5cb4494 into kubernetes-sigs:main Nov 18, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.1 milestone Nov 18, 2021
@CecileRobertMichon
Copy link
Contributor

@shysank now that this merged, can we add it to the other beta release branch and as a periodic job?

@shysank
Copy link
Contributor Author

shysank commented Nov 18, 2021

@shysank now that this merged, can we add it to the other beta release branch and as a periodic job?

do you mean backporting this pr to release-1.x, and add a job in cluster-api-provider-azure-periodics-release-v1beta1 ?

@CecileRobertMichon
Copy link
Contributor

we should add least run it periodically on the main branch, in cluster-api-provider-azure-periodics-release-main.

Separately, we might also want to backport this to release-1.0 so we can add an apiversion test for both cluster-api-provider-azure-periodics-release-v1beta1 and cluster-api-provider-azure-pr-release-v1beta1 (the latter would be run on backport PRs). Would backporting this PR be feasible, including capi 1.0.1 bump?

@shysank
Copy link
Contributor Author

shysank commented Nov 18, 2021

we should add least run it periodically on the main branch, in cluster-api-provider-azure-periodics-release-main.

+1

Separately, we might also want to backport this to release-1.0 so we can add an apiversion test for both cluster-api-provider-azure-periodics-release-v1beta1 and cluster-api-provider-azure-pr-release-v1beta1 (the latter would be run on backport PRs). Would backporting this PR be feasible, including capi 1.0.1 bump?

yeah it should be possible to backport

@k8s-infra-cherrypick-robot

@shysank: new pull request created: #1878

In response to this:

/cherry-pick release-1.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@shysank
Copy link
Contributor Author

shysank commented Nov 18, 2021

ah deleted the comment by mistake :( the bot works though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add e2e upgrade tests for v1alpha4 -> v1beta1 #1727
4 participants