-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: fix flaky image gc and mysql connect + add k3s debug log #13660
Conversation
The `E2E Tests (test-executor, v1.28.13+k3s1, minimal, false)` test has been flaky for awhile and keeps failing with the error `ErrImageNeverPull: Container image "quay.io/argoproj/argocli:latest" is not present with pull policy of Never. This shouldn't be happening because k3s should be using cri-dockerd as the container runtime and the "Load images" step handles loading that image into Docker. There were changes to cri-dockerd recently (Mirantis/cri-dockerd#373) that might be related, but it's impossible to tell without the logs. Signed-off-by: Mason Malone <[email protected]>
This is a follow-up to this Slack thread where Mason correlated this to #13600 being merged @meln5674 mentioned this might be due to image GC in #13641 (comment). Seems like you saw that in the logs too per the thread:
|
9d5b579
to
e368b6b
Compare
Signed-off-by: Mason Malone <[email protected]>
Signed-off-by: Mason Malone <[email protected]>
Signed-off-by: Mason Malone <[email protected]>
Signed-off-by: Mason Malone <[email protected]>
Signed-off-by: Mason Malone <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tracking this down Mason!
Approving this so we can unblock CI, but I do have some small comments below that we can resolve in a follow-up PR
@MasonM thanks for finding and fixing this. |
This addresses the comments from argoproj#13660. Also, it hopefully fixes the flaky `CI / Windows Unit Tests (pull_request)` test suite. The errors indicate it's trying to write temp files to `/tmp`: ``` --- FAIL: TestArtifactoryArtifactDriver_Load/Found (0.00s) http_test.go:75: Error Trace: D:/a/argo-workflows/argo-workflows/workflow/artifacts/http/http_test.go:75 Error: Received unexpected error: open /tmp/found: The system cannot find the path specified. Test: TestArtifactoryArtifactDriver_Load/Found ``` which obviously isn't the right directory under Windows, but the test does pass sometimes, and it seems like writing to the wrong directory would cause consistent failures. Regardless, the tests should be using `os.CreateTemp()` for this anyway. Signed-off-by: Mason Malone <[email protected]>
This addresses the comments from argoproj#13660. Also, it hopefully fixes the flaky `CI / Windows Unit Tests (pull_request)` test suite. The errors indicate it's trying to write temp files to `/tmp`: ``` --- FAIL: TestArtifactoryArtifactDriver_Load/Found (0.00s) http_test.go:75: Error Trace: D:/a/argo-workflows/argo-workflows/workflow/artifacts/http/http_test.go:75 Error: Received unexpected error: open /tmp/found: The system cannot find the path specified. Test: TestArtifactoryArtifactDriver_Load/Found ``` which obviously isn't the right directory under Windows, but the test does pass sometimes, and it seems like writing to the wrong directory would cause consistent failures. Regardless, the tests should be using `os.CreateTemp()` for this anyway. Signed-off-by: Mason Malone <[email protected]>
This addresses the comments from argoproj#13660. Also, it hopefully fixes the flaky `CI / Windows Unit Tests (pull_request)` test suite. The errors indicate it's trying to write temp files to `/tmp`: ``` --- FAIL: TestArtifactoryArtifactDriver_Load/Found (0.00s) http_test.go:75: Error Trace: D:/a/argo-workflows/argo-workflows/workflow/artifacts/http/http_test.go:75 Error: Received unexpected error: open /tmp/found: The system cannot find the path specified. Test: TestArtifactoryArtifactDriver_Load/Found ``` which obviously isn't the right directory under Windows, but the test does pass sometimes, and it seems like writing to the wrong directory would cause consistent failures. Regardless, the tests should be using `os.CreateTemp()` for this anyway. Signed-off-by: Mason Malone <[email protected]>
Motivation
A bunch of tests have been failing intermittently for awhile now, which is blocking PRs. For example, the
E2E Tests (test-executor, v1.28.13+k3s1, minimal, false)
test fails >50% of the time with the errorErrImageNeverPull: Container image "quay.io/argoproj/argocli:latest" is not present with pull policy of Never.
(example run). Many of these issues can't be diagnosed without access to the k3s logs.Modifications
journalctl
on a build failure. You can see k3s is being run with systemd from the output of theInstall and start K3S
step:make wait PROFILE=mysql
to wait for mysql to be available, which should fix intermittent failures likepersistence.go:29: test panicked: dial tcp [::1]:3306: connect: connection refused
ErrImageNeverPull
errorsVerification
Will watch action output