Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert some jobs to Ginkgo --label-filter #32648

Merged
merged 2 commits into from
Jun 16, 2024

Conversation

pohly
Copy link
Contributor

@pohly pohly commented May 23, 2024

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/config Issues or PRs related to code in /config size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/jobs sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels May 23, 2024
@@ -1067,7 +1068,7 @@ presubmits:
- '--node-test-args=--container-runtime-endpoint=unix:///var/run/crio/crio.sock --container-runtime-process-name=/usr/local/bin/crio --container-runtime-pid-file= --kubelet-flags="--cgroup-driver=systemd --cgroups-per-qos=true --cgroup-root=/ --runtime-cgroups=/system.slice/crio.service --kubelet-cgroups=/system.slice/kubelet.service" --extra-log="{\"name\": \"crio.log\", \"journalctl\": [\"-u\", \"crio\"]}"'
- --node-tests=true
- --provider=gce
- --test_args=--nodes=8 --focus="\[NodeConformance\]|\[NodeFeature:.+\]|\[NodeFeature\]" --skip="\[Flaky\]|\[Slow\]|\[Serial\]"
- --test_args=--nodes=8 --label-filter='(NodeConformance || !(NodeFeature: isEmpty)) && !Flaky && !Slow && !Serial'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ran 353 tests with old and new args.

@@ -702,7 +702,7 @@ presubmits:
- '--node-test-args=--feature-gates=SidecarContainers=true --service-feature-gates=SidecarContainers=true --container-runtime-endpoint=unix:///run/containerd/containerd.sock --container-runtime-process-name=/usr/bin/containerd --container-runtime-pid-file= --kubelet-flags="--cgroup-driver=systemd --cgroups-per-qos=true --cgroup-root=/ --runtime-cgroups=/system.slice/containerd.service" --extra-log="{\"name\": \"containerd.log\", \"journalctl\": [\"-u\", \"containerd*\"]}"'
- --node-tests=true
- --provider=gce
- --test_args=--nodes=1 --timeout=4h --focus="\[Serial\].*\[NodeFeature:SidecarContainers\]|\[NodeFeature:SidecarContainers\].*\[Serial\]" --skip="\[Flaky\]|\[Benchmark\]|\[NodeSpecialFeature:.+\]|\[NodeSpecialFeature\]|\[NodeFeature:Eviction\]"
- --test_args=--nodes=1 --timeout=4h --label-filter='Serial && NodeFeature: containsAny SidecarContainers && !Flaky && !Benchmark && NodeSpecialFeature: isEmpty && !(NodeFeature: containsAny Eviction)'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 tests with old and new flags:

kubernetes$ _output/bin/ginkgo --dry-run --silence-skips -v --label-filter='Serial && NodeFeature: containsAny SidecarContainers && !Flaky && !Benchmark && NodeSpecialFeature: isEmpty && !(NodeFeature: containsAny Eviction)' ./test/e2e_node/
...
[sig-node] Device Plugin [Feature:DevicePluginProbe] [NodeFeature:DevicePluginProbe] [Serial] DevicePlugin [Serial] [Disruptive] Can schedule a pod with a restartable init container [NodeFeature:SidecarContainers] [sig-node, Feature:DevicePluginProbe, NodeFeature:DevicePluginProbe, Serial, Disruptive, NodeFeature:SidecarContainers]
/nvme/gopath/src/k8s.io/kubernetes/test/e2e_node/device_plugin_test.go:610
• [0.000 seconds]
------------------------------
[sig-node] CPU Manager [Serial] [Feature:CPUManager] With kubeconfig updated with static CPU Manager policy run the CPU Manager tests should not reuse CPUs of restartable init containers [NodeFeature:SidecarContainers] [sig-node, Serial, Feature:CPUManager, NodeFeature:SidecarContainers]
/nvme/gopath/src/k8s.io/kubernetes/test/e2e_node/cpu_manager_test.go:713
• [0.000 seconds]
------------------------------
[sig-node] [NodeFeature:SidecarContainers] [Serial] Containers Lifecycle should restart the containers in right order after the node reboot [sig-node, NodeFeature:SidecarContainers, Serial]
/nvme/gopath/src/k8s.io/kubernetes/test/e2e_node/container_lifecycle_test.go:3128
• [0.000 seconds]
------------------------------
[sig-node] POD Resources [Serial] [Feature:PodResources] [NodeFeature:PodResources] with SRIOV devices in the system with CPU manager Static policy  should return the expected responses [NodeFeature:SidecarContainers] [sig-node, Serial, Feature:PodResources, NodeFeature:PodResources, NodeFeature:SidecarContainers]
/nvme/gopath/src/k8s.io/kubernetes/test/e2e_node/podresources_test.go:901
• [0.000 seconds]
------------------------------

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice - this is much easier to read and grok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quoting turned out to be a bit tricky: without quoting the string, YAML treats it as an object because of the ":".

Not a big problem, just something to remember...

- name: SKIP
value: \[Slow\]|\[Disruptive\]|\[Flaky\]|\[Feature:.+\]|PodSecurityPolicy|LoadBalancer|load.balancer|In-tree.Volumes.\[Driver:.nfs\]|PersistentVolumes.NFS|Network.should.set.TCP.CLOSE_WAIT.timeout|Simple.pod.should.support.exec.through.an.HTTP.proxy|subPath.should.support.existing|should.provide.basic.identity
value: PodSecurityPolicy|LoadBalancer|load.balancer|In-tree.Volumes.\[Driver:.nfs\]|PersistentVolumes.NFS|Network.should.set.TCP.CLOSE_WAIT.timeout|Simple.pod.should.support.exec.through.an.HTTP.proxy|subPath.should.support.existing|should.provide.basic.identity
Copy link
Contributor Author

@pohly pohly May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small difference is that tests with just alpha or beta feature gate dependency are allowed to run. However, we currently don't have any such test because any such test also has to add a Feature:<feature gate name> label to be skipped in normal jobs.

This runs 2322 tests with the old and new flags.

- name: FOCUS
value: "."
- name: LABEL_FILTER
value: !Slow && !Disruptive && !Flaky && Feature: isSubsetOf Alpha
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubernetes-sigs/kind#3582 got merged. Does that mean that it is usable now?

In other words, can we merge this PR?

/cc @aojea @BenTheElder

@pohly pohly changed the title WIP: convert jobs to Ginkgo --label-filter convert some jobs to Ginkgo --label-filter Jun 13, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 13, 2024
The new --label-filter expression is easier.
One small difference is that tests with just alpha or beta feature gate
dependency are allowed to run. However, we currently don't have any such
test because any such test also has to add a Feature:<feature gate name>
label to be skipped in normal jobs.
@aojea
Copy link
Member

aojea commented Jun 16, 2024

/lgtm
/approve

@pohly do you mind sending an update to the mailing list with the existing changes and a brief of explanation of the benefits?

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 16, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 16, 2024
@k8s-ci-robot k8s-ci-robot merged commit 128a653 into kubernetes:master Jun 16, 2024
7 checks passed
@k8s-ci-robot
Copy link
Contributor

@pohly: Updated the job-config configmap in namespace default at cluster test-infra-trusted using the following files:

  • key dynamic-resource-allocation.yaml using file config/jobs/kubernetes/sig-node/dynamic-resource-allocation.yaml
  • key sig-node-presubmit.yaml using file config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml
  • key kubernetes-kind.yaml using file config/jobs/kubernetes/sig-testing/kubernetes-kind.yaml

In response to this:

The new --label-filter expression is easier.

Depends on:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@@ -32,7 +32,7 @@ periodics:
kind build node-image --image=dra/node:latest . &&
trap 'kind export logs "${ARTIFACTS}/kind"; kind delete cluster' EXIT &&
kind create cluster --retain --config test/e2e/dra/kind.yaml --image dra/node:latest &&
KUBERNETES_PROVIDER=local KUBECONFIG=${HOME}/.kube/config GINKGO_PARALLEL_NODES=8 E2E_REPORT_DIR=${ARTIFACTS} hack/ginkgo-e2e.sh -ginkgo.focus=DynamicResourceAllocation -ginkgo.skip=\[Serial\]
KUBERNETES_PROVIDER=local KUBECONFIG=${HOME}/.kube/config GINKGO_PARALLEL_NODES=8 E2E_REPORT_DIR=${ARTIFACTS} hack/ginkgo-e2e.sh -ginkgo.filter='Feature: containsAny DynamicResourceAllocation && !Serial'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, now that I look at this one last time after merging I am spotting a typo! 🥵

s/-ginkgo.filter/-gingko.label-filter/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -188,7 +188,7 @@ periodics:
- '--node-test-args=--feature-gates="DynamicResourceAllocation=true" --service-feature-gates="DynamicResourceAllocation=true,SchedulerQueueingHints=true" --runtime-config=resource.k8s.io/v1alpha2=true --container-runtime-endpoint=unix:///var/run/crio/crio.sock --container-runtime-process-name=/usr/local/bin/crio --container-runtime-pid-file= --kubelet-flags="--cgroup-driver=systemd --cgroups-per-qos=true --cgroup-root=/ --runtime-cgroups=/system.slice/crio.service --kubelet-cgroups=/system.slice/kubelet.service" --extra-log="{\"name\": \"crio.log\", \"journalctl\": [\"-u\", \"crio\"]}"'
- --node-tests=true
- --provider=gce
- --test_args=--focus="\[Feature:DynamicResourceAllocation\]" --skip="\[Flaky\]"
- "--test_args=--label-filter='Feature: containsAny DynamicResourceAllocation && !Flaky'"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e2e-node tests do not preserve white space properly 😢

From https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/124548/pull-kubernetes-unit/1802615829762674688:

I0617 08:30:21.206725    9333 ssh.go:146] Running the command ssh, with args: [-o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o CheckHostIP=no -o StrictHostKeyChecking=no -o ServerAliveInterval=30 -o LogLevel=ERROR -i /workspace/.ssh/google_compute_engine [email protected] -- sudo /bin/bash -c 'cd /tmp/node-e2e-20240617T083009 && set -o pipefail; timeout -k 30s 3900.000000s ./ginkgo --label-filter='Feature: containsAny DynamicResourceAllocation && !Flaky' --no-color -v ./e2e_node.test -- --system-spec-name= --system-spec-file= --extra-envs= --runtime-config= --v 4 --node-name=n1-standard-4-fedora-coreos-40-20240519-3-0-gcp-x86-64-54709c74 --report-dir=/tmp/node-e2e-20240617T083009/results --report-prefix=fedora --image-description="fedora-coreos-40-20240519-3-0-gcp-x86-64" --feature-gates="DynamicResourceAllocation=true" --service-feature-gates="DynamicResourceAllocation=true" --runtime-config=resource.k8s.io/v1alpha2=true --container-runtime-endpoint=unix:///var/run/crio/crio.sock --container-runtime-process-name=/usr/local/bin/crio --container-runtime-pid-file= --kubelet-flags="--cgroup-driver=systemd --cgroups-per-qos=true --cgroup-root=/ --runtime-cgroups=/system.slice/crio.service --kubelet-cgroups=/system.slice/kubelet.service" --extra-log="{\"name\": \"crio.log\", \"journalctl\": [\"-u\", \"crio\"]}" 2>&1 | tee -i /tmp/node-e2e-20240617T083009/results/n1-standard-4-fedora-coreos-40-20240519-3-0-gcp-x86-64-54709c74-ginkgo.log']
E0617 08:30:21.848227    9333 ssh.go:149] failed to run SSH command: out: �[38;5;9m�[1mginkgo run�[0m �[38;5;9mfailed�[0m
  Found no test suites
, err: exit status 1

In this case, it's a limitation of ssh, but that is something that the e2e_node tests should be aware of. The solution for "run complex shell commands via ssh" is to invoke ssh /bin/sh and then feed it the commands on stdin.

Will prepare a fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except that this occurs so deep down that properly composing the input script has the same problems.

I guess it boils down to "use double quotes" for --test_args - let's try with that: #32774

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/config Issues or PRs related to code in /config area/jobs cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants