-
-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distinguish github_actions_labels for CPU vs. CUDA builds #6910
Conversation
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
zip_keys: | ||
- # [unix] | ||
- c_compiler_version # [unix] | ||
- cxx_compiler_version # [unix] | ||
- fortran_compiler_version # [unix] | ||
- cuda_compiler # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"] | ||
- cuda_compiler_version # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"] | ||
- github_actions_labels # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these zip keys have become almost impossible to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what alternative you'd have in mind - unless we completely redesign how conda works (or stop supporting CUDA builds), this is irreducible complexity IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can start with: #6917
I would prefer that users of the GPU runners throttle themselves instead of adding extra complexity. The GPU runners aren't necessary in many cases. They also don't really guarantee that the builds don't fail (due to flaky tests). They can "throttle" by falling back to the CPU runners and rerendering. |
IMO flaky tests can be skipped, and I'd much rather have an extra entry in the zip rather than sacrifice testing fidelity. The ideal would be to build on CPU agents and only test the GPU bits on a GPU runner, but given that that's not yet possible and pytorch has chosen to test also the GPU paths, I think the best (=least bad) tradeoff would be what this PR proposes. The CUDA 11.8 discussion is IMO unrelated - I'm not opposed but I think it should be had on its own merits. |
Its partially related. We are doing all this work and jumping through hoops for cuda builds. Some of the old tricks need to go, to make it possible for us to think of new ones. |
The only two feedstocks using GPU runners are pytorch and magma. For magma, I never got around to enabling any tests that need GPUS, so I can move that feedstock to the cpu runners if that helps. Though, TBH, magma is rarely built anymore. My understanding is that the problem you're trying to solve is that pytorch is competing with itself for GPU runners because ALL of the builds are assigned to GPU runners (even the non-GPU builds). Is that correct? |
Exactly. |
Wonder if it is worth reconsidering We had tried that before. However enough is different between then and now that we could do things that were not previously possible For example we can store artifacts on CI. So using that for caching is possible (without needing to create that from scratch) |
…s placeholders Co-authored-by: Daniel Ching <[email protected]>
46c9819
to
e41e1b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will work, but as you mentioned it will break any of the 7 existing feedstocks who use the openstack server for CUDA builds. They will need to update their conda_build_config.yaml to have three items in the github_actions_labels. Otherwise, they will get a rendering error about the length of the zip not being equal.
@hmaarrfk, could you live with this for now? |
Which feedstocks are you planning to use it for? If it's small, you could do it as a migrator. |
Also, is it possible to only set this in the feedstock |
unfortunately not, because it's not possible to extend an existing zip_key (I tried)1. C.f. the POC branch that was rendered with the CBC from this PR.
Primarily pytorch and tensorflow for now. Footnotes
|
We could just do skip: True # [linux and (cuda_compiler_version == "None") == (github_actions_labels == "cirun-openstack-gpu-placeholder")] after adding
in the feedstock |
That seems pretty hacky to me (aside from the fact that smithy does not prune the jobs correctly - I tested it). I mean, if the resistance to this proposal is strong, I'm not gonna die on this hill, but to me it seems natural to add the runner-type (CPU vs. CUDA) to the same zip that decides whether it's a CPU or a CUDA build. |
Looking at Pytorch v2.4.x conda-forge/pytorch-cpu-feedstock#325
So to save 6 / 42 hours we are adding complexity to anybody zipping keys at conda-forge. it just doesn't seem smart to me. I'm not going to vote "for this approach", but I don't want to block it. Just I'm a little tired of fighting with these zipp'ed keys. Lets keep it simple and not take precious human resources to discuss this. IMO: just request the CPU runners on pytorch for things like the 2.4.x maintenance branch and one should be fine. We could also create ways to temporarily reduce the cuda target architectures during debugging sessions. These are "pytorch only" solutions that don't need to be imposed on the rest of conda-forge. |
TL;DR: an extra zip entry costs ~nothing, but without we'd pay the extra runtime for every CI runFor pytorch main, the calculus is 3x CPU + 3xCUDA, and the aarch CPU build got longer because we're running parts of the test suite in emulation. So it's something like 12h runtime out of a total of 39, or almost a third. I don't care much what we do on the maintenance branches, but main should be relatively painfree to rebuild, and IMO we shouldn't accept an avoidable 50% more runtime. The simple fact that the GPU agents are essentially our rarest and most precious CI resource would IMO be enough reason to do this - even if it were only for pytorch (which it isn't, because tensorflow will be in the same situation and has an even bigger matrix).
This affects a grand total of 5 feedstocks in conda-forge, 2 of which would benefit from it. 🤷 Overall, if this is so contentious, then I'll just go with Isuru's suggestion - but then don't start complaining about the superfluous CI workflows that this creates (half of which get skipped as soon as they hit conda-build) -- I find that quite a bit more painful than an extra zip entry, but de gustibus non est disputandum I guess... |
FWIW, I'm happy to help out with issues on these things if you ping me. I haven't run into any unsolvable cases yet (well, aside from extending the zip locally, which conda-build doesn't support). |
Welp, given the concerns here I was planning to use Isuru's workaround, but on the pytorch feedstock that actually breaks windows
Perhaps this is just a question of needing to add
on the windows server? Any chance we could try to add that @baszalmstra @wolfv? |
I don’t want to block this. Please disregard my concerns. |
We should probably reduce |
Thanks for the proposition. I tested this in conda-forge/pytorch-cpu-feedstock#332 but even with some more careful changes in conda-forge/conda-smithy#2233, this either shortens the variant configs to the point that the key attributes (is it a CUDA build?) wouldn't be visible from the filename anymore, or still runs into issues with the filename lengths. Despite the fact that the enthusiasm here is not exactly high, I think this is the right way to go about it conceptually (the zip that determines CUDA/non-CUDA is the canonical place for this distinction), and that the work-arounds are worse (filename length issues, aside from pointlessly doubling the number of linux jobs, etc.) than what they're meant to avoid (a single extra zip entry). So I'll merge this and fix up the handful of affected feedstocks. The changes are minimal, e.g. I tested that this works for jaxlib: --- a/recipe/conda_build_config.yaml
+++ b/recipe/conda_build_config.yaml
@@ -8,5 +8,8 @@ MACOSX_SDK_VERSION: # [osx and x86_64]
- '10.14' # [osx and x86_64]
c_stdlib_version: # [osx and x86_64]
- '10.14' # [osx and x86_64]
-github_actions_labels:
-- cirun-openstack-cpu-large
+
+github_actions_labels: # [linux]
+- cirun-openstack-cpu-large # [linux]
+- cirun-openstack-cpu-large # [linux]
+- cirun-openstack-cpu-large # [linux] |
Please don't merge things like this on your own without approval |
There were other approvals, and Mark withdrew his objections. You proposed alternatives but didn't request changes, much less provide arguments why this would be wrong. I'm willing to do the required clean-ups on the 5 affected feedstocks too, so this is a non-issue, unless you have a very concrete argument why this is the wrong approach. |
Also: I've tried all the suggested alternatives beforehand as well, and they were worse than what they were working around. |
I still have Mark's objection. The zip_keys for docker_image and related ones are getting out of hand. I'd like to one day remove |
I agree on this goal! And there's nothing standing in the way there AFAICT - as soon as we drop CUDA 11.8, we'll be able to drop |
Sure, but this |
This is a conversation we need to have in an issue and I think the best course of action is to revert this PR, open an issue and if needed discuss in a core meeting. |
We recently removed If we keep using the newest CUDA, we might even get to drop |
Wrote up an issue: #6967 |
Some feedstocks have moved to the cirun open-gpu server, and even fewer are using GPU agents (basically tensorflow & pytorch). Those are the rarest resource (currently also at halved capacity due to some issues with the physical GPUs in the server).
Additionally, we cannot yet separate build and test phases into different agents, c.f. conda-forge/pytorch-cpu-feedstock#314 & conda-forge/conda-smithy#1472
As a consequence, a given pytorch PR currently takes at least >24h to complete, and that's if there are no competing GPU builds. To alleviate the pressure on the rarest resource, I'd like to zip
github_actions_labels
into thecuda_compiler
zip, because this would allow us to at least choose CPU agents for the non-CUDA builds. An example based on this PR can be found on this branch.This would very likely break the rendering of existing feedstocks on the open-gpu server, but there are currently only 7 of those, out of which only 5 are cuda-enabled (and it won't concern CPU-only feedstocks due to being gated on
CF_CUDA_ENABLED
), so IMO this should be manageable.@conda-forge/pytorch-cpu
@conda-forge/tensorflow
@conda-forge/jaxlib
@conda-forge/libmagma
@conda-forge/flash-attn