Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ci): revert *_PKG_FILES on windows argoexec build #13568

Merged
merged 1 commit into from
Sep 6, 2024

Conversation

Joibel
Copy link
Member

@Joibel Joibel commented Sep 6, 2024

Fixes #13562

Motivation

The windows build in the release.yaml CI was broken by #13145, we need releases to work so this hack fixes it.

(CI should probably prove this is buildable rather than waiting for release to run.)

Modifications

In #13145 the dependencies for the binaries were fixed to be technically correct - as in the binaries depended upon the .go files that they were built from.

This had the side effect of causing pkg/apis/workflow/v1alpha1/openapi_generated.go to get rebuilt when building dist/argoexec. This is technically correct behaviour.

This fails when building on windows because windows has stupid limitations. This means that the Makefile step which trys to create a symlink fails with ln: failed to create symbolic link 'v3' -> '.': File name too long.

Attempting to enable LongPathsEnabled (What year is it?) doesn't help either inside or outside the container run.

So this PR runs the windows build with a hack reverting the dependencies when building the windows dist/argoexec only via a Makefile variable HACK_PKG_FILES_AS_PKGS.

I consider this "fine" as we don't directly support development on Windows, and the reverted behaviour there if anyone really wants to use it isn't that terrible (as in people lived with it for years before #13145 fixed it)

Verification

Verified here

@Joibel Joibel marked this pull request as ready for review September 6, 2024 09:53
@Joibel Joibel added the prioritized-review For members of the Sustainability Effort label Sep 6, 2024
@Joibel Joibel changed the title fix(ci): fix windows argoexec build fix(ci): fix windows argoexec release build Sep 6, 2024
@Joibel Joibel added the area/build Build or GithubAction/CI issues label Sep 6, 2024
@Joibel Joibel requested a review from agilgur5 September 6, 2024 09:55
@agilgur5
Copy link
Member

agilgur5 commented Sep 6, 2024

This fails when building on windows because windows has stupid limitations. This means that the Makefile step which trys to create a symlink fails with ln: failed to create symbolic link 'v3' -> '.': File name too long.

Attempting to enable LongPathsEnabled (What year is it?) doesn't help either inside or outside the container run.

Hmm my read of the error message was that . is problematic or something, since v3 and . are both short and so it's just giving an incorrect error message on some fallthrough case. That would explain why LongPathsEnabled has no effect.
Unless it's one of the files inside the directory being too long and Windows recreates or links them all (or something) during a symlink?

To be fair, the symlinking part of the build is potentially one of the weirdest pieces and can be problematic in other ways (e.g. parallelized operations per #12583 (comment)). I've been meaning to sit down with it and see if it can be refactored out.

@Joibel
Copy link
Member Author

Joibel commented Sep 6, 2024

. resolves to /Users/<something>/go/src/github.com/argoproj/argo-workflows, and ln -s appears when run from git bash under windows to effectively be akin to a cp -a. The result doesn't even look like a symlink, and it doesn't seem to behave like mklink.

I chose this way of doing it as expedient rather than perhaps the best choice. Potentially windows could be detected and mklink could be invoked. Docker under windows in a VM doesn't work for me (nested VMs) right now to play with this in a sane way.

@agilgur5 agilgur5 changed the title fix(ci): fix windows argoexec release build fix(ci): revert *_PKG_FILES on windows argoexec build Sep 6, 2024
CLI_PKG_FILES := $(shell go list -f '{{ join .Deps "\n" }}' ./cmd/argo/ | grep 'argoproj/argo-workflows/v3/' | xargs go list -f '{{ range $$file := .GoFiles }}{{ print $$.ImportPath "/" $$file "\n" }}{{ end }}' | cut -c 39-)
CONTROLLER_PKG_FILES := $(shell go list -f '{{ join .Deps "\n" }}' ./cmd/workflow-controller/ | grep 'argoproj/argo-workflows/v3/' | xargs go list -f '{{ range $$file := .GoFiles }}{{ print $$.ImportPath "/" $$file "\n" }}{{ end }}' | cut -c 39-)
HACK_PKG_FILES_AS_PKGS ?= false
ifeq ($(HACK_PKG_FILES_AS_PKGS),false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to detect Windows instead of having the Windows Dockerfile use an env var?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can detect Windows and do this the long way around we could attempt a mklink there too.

If I get access to some kind of windows that let's me play I might have a go, but I'd rather right now it was awkward and people who can fix it found this and tried to fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can definitely detect Windows, in the same way you can detect a Linux OS, e.g. with uname or $OSTYPE. There are some variations on that (i.e. prior to WSL etc when Cygwin and MinGW were more used), but technically the main one we need to handle is the GH + Docker one, the rest a Windows contributor could figure out

@agilgur5
Copy link
Member

agilgur5 commented Sep 6, 2024

we don't directly support development on Windows

We do have contributors on Windows also, including Tianchu, per #12603. That issue also involves a symlink file in the codebase (different, but also symlink related).

But yes I agree that reverting to the old behavior isn't that harmful and is expedient

Copy link
Member

@agilgur5 agilgur5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to fix releases in the interim (as it is Friday and after hours in Alan's time). Can make more changes in further PRs

Thanks for rigorously testing this in your fork Alan!

@agilgur5 agilgur5 merged commit ed3b8ce into argoproj:main Sep 6, 2024
29 checks passed
@agilgur5
Copy link
Member

agilgur5 commented Sep 6, 2024

(CI should probably prove this is buildable rather than waiting for release to run.)

I was thinking about this too, this would require us to build the images for multiple architectures in CI, which will take more time (substantially so, the cross-compilation is quite slow). We could do that if we make the other steps not wait on it I suppose.

There's also other steps of the release process that aren't tested in CI, such as the manifest generation and signing, which did cause some problems before, e.g. #13424 (comment) / #13155

@agilgur5 agilgur5 added the area/windows Windows Container support label Sep 6, 2024
@Joibel
Copy link
Member Author

Joibel commented Sep 6, 2024

This one is an easy parallel step, which will only slightly damage the number of server hamsters each build uses, so shouldn't impact too much.

@agilgur5
Copy link
Member

agilgur5 commented Sep 6, 2024

The E2E tests depend on the image build, so the longest build of those will cause the longest delay. I pulled the numbers in #12449 (comment) and it's an order of magnitude longer to run all the cross-compilation for the image (~2min -> ~20min in best case. also the CLI release has even more cross-compilation that takes long too, but seems to be faster than an image build).

That's why I said that if we wanted to do this, it would have to be a separate, non-dependent step.

But also there's a limit to the amount of parallelism GH allows per account/org, so the more we add, the more likely we are to hit that limit and cause things to become serial as a result. We do seem to already hit this occasionally if we have many simultaneous PRs or if CD is doing a release (the GH jobs will be pending for a while before starting).

Also not sure if you missed it, but I did leave one comment on this PR specifically

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build Build or GithubAction/CI issues area/windows Windows Container support prioritized-review For members of the Sustainability Effort
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Image Releases are partially failing on main
2 participants