Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 test: fix collector for machines not having an IP in status and change ignition ssh user to capv #3010

Merged
merged 2 commits into from
May 21, 2024

Conversation

chrischdi
Copy link
Member

@chrischdi chrischdi commented May 16, 2024

What this PR does / why we need it:

In case of test failures where machines do not yet have an IP set in status, we are currently not able to collect information like cloud-init log.

This implements a fallback to get the IP address from vSphere instead to still be able to gather information via ssh.

Example Job: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/periodic-cluster-api-provider-vsphere-e2e-supervisor-conformance-ci-latest-main/1790940294057824256

Which logs:

Failed to get logs for Machine k8s-conformance-9dmcar-cp-z2zcx-mhzxn, Cluster k8s-conformance-n8g9q1/k8s-conformance-9dmcar: dialing host IP address at : dial tcp :22: connect: connection refused

And the resulting empty files: https://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/periodic-cluster-api-provider-vsphere-e2e-supervisor-conformance-ci-latest-main/1790940294057824256/artifacts/clusters/k8s-conformance-9dmcar/machines/k8s-conformance-9dmcar-cp-z2zcx-mhzxn/

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 16, 2024
@chrischdi chrischdi changed the title 🌱 test: fix collector for machines not having an IP in status 🌱 [WIP] test: fix collector for machines not having an IP in status May 16, 2024
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 16, 2024
@chrischdi chrischdi changed the title 🌱 [WIP] test: fix collector for machines not having an IP in status 🌱 [WIP] test: fix collector for machines not having an IP in status and change ignition ssh user to capv May 16, 2024
@chrischdi chrischdi changed the title 🌱 [WIP] test: fix collector for machines not having an IP in status and change ignition ssh user to capv 🌱 test: fix collector for machines not having an IP in status and change ignition ssh user to capv May 16, 2024
@chrischdi
Copy link
Member Author

/test help

@k8s-ci-robot
Copy link
Contributor

@chrischdi: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-cluster-api-provider-vsphere-e2e-govmomi-blocking-main
  • /test pull-cluster-api-provider-vsphere-e2e-govmomi-conformance-ci-latest-main
  • /test pull-cluster-api-provider-vsphere-e2e-govmomi-conformance-main
  • /test pull-cluster-api-provider-vsphere-e2e-govmomi-main
  • /test pull-cluster-api-provider-vsphere-e2e-govmomi-upgrade-1-30-1-31-main
  • /test pull-cluster-api-provider-vsphere-e2e-supervisor-blocking-main
  • /test pull-cluster-api-provider-vsphere-e2e-supervisor-conformance-ci-latest-main
  • /test pull-cluster-api-provider-vsphere-e2e-supervisor-conformance-main
  • /test pull-cluster-api-provider-vsphere-e2e-supervisor-main
  • /test pull-cluster-api-provider-vsphere-e2e-supervisor-upgrade-1-30-1-31-main
  • /test pull-cluster-api-provider-vsphere-e2e-vcsim-govmomi-main
  • /test pull-cluster-api-provider-vsphere-e2e-vcsim-supervisor-main
  • /test pull-cluster-api-provider-vsphere-test-main
  • /test pull-cluster-api-provider-vsphere-verify-main

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-provider-vsphere-apidiff-main

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-provider-vsphere-apidiff-main
  • pull-cluster-api-provider-vsphere-e2e-govmomi-blocking-main
  • pull-cluster-api-provider-vsphere-e2e-supervisor-blocking-main
  • pull-cluster-api-provider-vsphere-test-main
  • pull-cluster-api-provider-vsphere-verify-main

In response to this:

/test help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@chrischdi
Copy link
Member Author

/test pull-cluster-api-provider-vsphere-e2e-govmomi-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-main

@chrischdi
Copy link
Member Author

/cherry-pick release-1.10

@k8s-infra-cherrypick-robot

@chrischdi: once the present PR merges, I will cherry-pick it on top of release-1.10 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@chrischdi
Copy link
Member Author

/cherry-pick release-1.9

@k8s-infra-cherrypick-robot

@chrischdi: once the present PR merges, I will cherry-pick it on top of release-1.9 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@chrischdi
Copy link
Member Author

/cherry-pick release-1.8

@k8s-infra-cherrypick-robot

@chrischdi: once the present PR merges, I will cherry-pick it on top of release-1.8 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@chrischdi
Copy link
Member Author

/cherry-pick release-1.7

@k8s-infra-cherrypick-robot

@chrischdi: once the present PR merges, I will cherry-pick it on top of release-1.7 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@chrischdi
Copy link
Member Author

/assign @sbueringer @fabriziopandini

Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! just a few nits

test/framework/log/collector.go Outdated Show resolved Hide resolved
test/framework/log/collector.go Outdated Show resolved Hide resolved
test/framework/log/collector.go Outdated Show resolved Hide resolved
test/framework/log/collector.go Outdated Show resolved Hide resolved
@chrischdi chrischdi force-pushed the pr-fix-machine-collector branch 2 times, most recently from 39b37f6 to a801c2a Compare May 17, 2024 12:12
@sbueringer
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 17, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: aa72c221c11c0c44678a604323ebdb3751e0a270

@@ -14,3 +14,10 @@ patches:
- target:
kind: VSphereMachineTemplate
path: ../commons/remove-storage-policy.yaml
# Replace ssh user to match expected user by the e2e machine collector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: is it possible to change it cluster-template-ignition.yaml instead?

Copy link
Member

@sbueringer sbueringer May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep the core user there. It's sort of standard with Ignition I think? (IIRC)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, core is the standard in ignition.

var errs []error
// Try with all available IPs unless it succeeded.
for _, machineIPAddress := range machineIPAddresses {
if err := executeRemoteCommand(f, machineIPAddress, command, args...); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: does this imply we do expect ssh connectivity from the prow container to the machine? is this a new constraint introduced by this PR or it already exists for something else (thinking about constraints for the new test environment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already did this for a while to get logs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR just fixes the usage where we try to pull data via SSH to make it work in more failure cases. Before we tried but often did not get any data because there was no IP address yet set on the Machine's status object.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 21, 2024
@chrischdi
Copy link
Member Author

/test pull-cluster-api-provider-vsphere-e2e-govmomi-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-main

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented May 21, 2024

@chrischdi: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-vsphere-e2e-supervisor-main 84beaf1 link true /test pull-cluster-api-provider-vsphere-e2e-supervisor-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@chrischdi chrischdi added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label May 21, 2024
@sbueringer
Copy link
Member

Merging, so we can rebase the bump PR. We can follow-up if necessary

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 21, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: f5b6ed18d888b28a85a12c2c00e7918083e69758

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 21, 2024
@k8s-ci-robot k8s-ci-robot merged commit 172dc9b into kubernetes-sigs:main May 21, 2024
16 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.11 milestone May 21, 2024
@k8s-infra-cherrypick-robot

@chrischdi: #3010 failed to apply on top of branch "release-1.9":

Applying: test: fix collector for machines not having an IP in status
Using index info to reconstruct a base tree...
A	test/e2e/data/infrastructure-vsphere-govmomi/main/ignition/kustomization.yaml
M	test/e2e/e2e_suite_test.go
A	test/framework/log/collector.go
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/log_collector.go
CONFLICT (content): Merge conflict in test/e2e/log_collector.go
Auto-merging test/e2e/e2e_suite_test.go
CONFLICT (content): Merge conflict in test/e2e/e2e_suite_test.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 test: fix collector for machines not having an IP in status
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-1.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot

@chrischdi: #3010 failed to apply on top of branch "release-1.8":

Applying: test: fix collector for machines not having an IP in status
Using index info to reconstruct a base tree...
A	test/e2e/data/infrastructure-vsphere-govmomi/main/ignition/kustomization.yaml
M	test/e2e/e2e_suite_test.go
A	test/framework/log/collector.go
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/log_collector.go
CONFLICT (content): Merge conflict in test/e2e/log_collector.go
Auto-merging test/e2e/e2e_suite_test.go
CONFLICT (content): Merge conflict in test/e2e/e2e_suite_test.go
Auto-merging test/e2e/data/infrastructure-vsphere/main/base/kustomization.yaml
CONFLICT (content): Merge conflict in test/e2e/data/infrastructure-vsphere/main/base/kustomization.yaml
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 test: fix collector for machines not having an IP in status
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-1.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot

@chrischdi: #3010 failed to apply on top of branch "release-1.7":

Applying: test: fix collector for machines not having an IP in status
Using index info to reconstruct a base tree...
A	test/e2e/data/infrastructure-vsphere-govmomi/main/ignition/kustomization.yaml
M	test/e2e/e2e_suite_test.go
A	test/framework/log/collector.go
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/log_collector.go
CONFLICT (content): Merge conflict in test/e2e/log_collector.go
Auto-merging test/e2e/e2e_suite_test.go
CONFLICT (content): Merge conflict in test/e2e/e2e_suite_test.go
Auto-merging test/e2e/data/infrastructure-vsphere/main/base/kustomization.yaml
CONFLICT (content): Merge conflict in test/e2e/data/infrastructure-vsphere/main/base/kustomization.yaml
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 test: fix collector for machines not having an IP in status
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-1.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot

@chrischdi: #3010 failed to apply on top of branch "release-1.10":

Applying: test: fix collector for machines not having an IP in status
Applying: remove direct dependency on kind
Using index info to reconstruct a base tree...
M	Makefile
M	test/go.mod
Falling back to patching base and 3-way merge...
Auto-merging test/go.mod
CONFLICT (content): Merge conflict in test/go.mod
Auto-merging Makefile
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 remove direct dependency on kind
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-1.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants