Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distinguish github_actions_labels for CPU vs. CUDA builds #6910

Merged
merged 2 commits into from
Jan 26, 2025

Conversation

h-vetinari
Copy link
Member

Some feedstocks have moved to the cirun open-gpu server, and even fewer are using GPU agents (basically tensorflow & pytorch). Those are the rarest resource (currently also at halved capacity due to some issues with the physical GPUs in the server).

Additionally, we cannot yet separate build and test phases into different agents, c.f. conda-forge/pytorch-cpu-feedstock#314 & conda-forge/conda-smithy#1472

As a consequence, a given pytorch PR currently takes at least >24h to complete, and that's if there are no competing GPU builds. To alleviate the pressure on the rarest resource, I'd like to zip github_actions_labels into the cuda_compiler zip, because this would allow us to at least choose CPU agents for the non-CUDA builds. An example based on this PR can be found on this branch.

This would very likely break the rendering of existing feedstocks on the open-gpu server, but there are currently only 7 of those, out of which only 5 are cuda-enabled (and it won't concern CPU-only feedstocks due to being gated on CF_CUDA_ENABLED), so IMO this should be manageable.

@conda-forge/pytorch-cpu
@conda-forge/tensorflow
@conda-forge/jaxlib
@conda-forge/libmagma
@conda-forge/flash-attn

@h-vetinari h-vetinari requested a review from a team as a code owner January 10, 2025 15:43
@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

zip_keys:
- # [unix]
- c_compiler_version # [unix]
- cxx_compiler_version # [unix]
- fortran_compiler_version # [unix]
- cuda_compiler # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]
- cuda_compiler_version # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]
- github_actions_labels # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these zip keys have become almost impossible to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what alternative you'd have in mind - unless we completely redesign how conda works (or stop supporting CUDA builds), this is irreducible complexity IMO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can start with: #6917

@hmaarrfk
Copy link
Contributor

I would prefer that users of the GPU runners throttle themselves instead of adding extra complexity.

The GPU runners aren't necessary in many cases. They also don't really guarantee that the builds don't fail (due to flaky tests).

They can "throttle" by falling back to the CPU runners and rerendering.

@h-vetinari
Copy link
Member Author

IMO flaky tests can be skipped, and I'd much rather have an extra entry in the zip rather than sacrifice testing fidelity. The ideal would be to build on CPU agents and only test the GPU bits on a GPU runner, but given that that's not yet possible and pytorch has chosen to test also the GPU paths, I think the best (=least bad) tradeoff would be what this PR proposes.

The CUDA 11.8 discussion is IMO unrelated - I'm not opposed but I think it should be had on its own merits.

@hmaarrfk
Copy link
Contributor

The CUDA 11.8 discussion is IMO unrelated - I'm not opposed but I think it should be had on its own merits.

Its partially related. We are doing all this work and jumping through hoops for cuda builds. Some of the old tricks need to go, to make it possible for us to think of new ones.

recipe/conda_build_config.yaml Outdated Show resolved Hide resolved
@carterbox
Copy link
Member

The only two feedstocks using GPU runners are pytorch and magma. For magma, I never got around to enabling any tests that need GPUS, so I can move that feedstock to the cpu runners if that helps. Though, TBH, magma is rarely built anymore.

My understanding is that the problem you're trying to solve is that pytorch is competing with itself for GPU runners because ALL of the builds are assigned to GPU runners (even the non-GPU builds). Is that correct?

@h-vetinari
Copy link
Member Author

My understanding is that the problem you're trying to solve is that pytorch is competing with itself for GPU runners because ALL of the builds are assigned to GPU runners (even the non-GPU builds). Is that correct?

Exactly.

@jakirkham
Copy link
Member

Wonder if it is worth reconsidering ccache or similar ( conda-forge/conda-forge.github.io#389 )

We had tried that before. However enough is different between then and now that we could do things that were not previously possible

For example we can store artifacts on CI. So using that for caching is possible (without needing to create that from scratch)

Copy link
Member

@carterbox carterbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will work, but as you mentioned it will break any of the 7 existing feedstocks who use the openstack server for CUDA builds. They will need to update their conda_build_config.yaml to have three items in the github_actions_labels. Otherwise, they will get a rendering error about the length of the zip not being equal.

@h-vetinari
Copy link
Member Author

@hmaarrfk, could you live with this for now?

@isuruf
Copy link
Member

isuruf commented Jan 21, 2025

Which feedstocks are you planning to use it for? If it's small, you could do it as a migrator.

@isuruf
Copy link
Member

isuruf commented Jan 21, 2025

Also, is it possible to only set this in the feedstock recipe/conda_build_config.yaml and extend the zip_keys there?

@h-vetinari
Copy link
Member Author

h-vetinari commented Jan 21, 2025

Also, is it possible to only set this in the feedstock recipe/conda_build_config.yaml and extend the zip_keys there?

unfortunately not, because it's not possible to extend an existing zip_key (I tried)1. C.f. the POC branch that was rendered with the CBC from this PR.

Which feedstocks are you planning to use it for? If it's small, you could do it as a migrator.

Primarily pytorch and tensorflow for now.

Footnotes

  1. I guess it should be possible in principle (if all the keys of the to-be-overwritten zip are present), but this would need work in conda-build

@isuruf
Copy link
Member

isuruf commented Jan 21, 2025

We could just do

skip: True   # [linux and (cuda_compiler_version == "None") == (github_actions_labels == "cirun-openstack-gpu-placeholder")]

after adding

github_actions_labels:                                      # [linux]
  - cirun-openstack-cpu-placeholder                 # [linux]
  - cirun-openstack-gpu-placeholder                 # [linux]

in the feedstock

@h-vetinari
Copy link
Member Author

That seems pretty hacky to me (aside from the fact that smithy does not prune the jobs correctly - I tested it).

I mean, if the resistance to this proposal is strong, I'm not gonna die on this hill, but to me it seems natural to add the runner-type (CPU vs. CUDA) to the same zip that decides whether it's a CPU or a CUDA build.

@hmaarrfk
Copy link
Contributor

Looking at Pytorch v2.4.x conda-forge/pytorch-cpu-feedstock#325

  • The CPU jobs take about 3 hours to run. - 2 jobs
  • The CUDA jobs take 9 hours - 4 jobs

So to save 6 / 42 hours we are adding complexity to anybody zipping keys at conda-forge. it just doesn't seem smart to me.

I'm not going to vote "for this approach", but I don't want to block it. Just I'm a little tired of fighting with these zipp'ed keys.

Lets keep it simple and not take precious human resources to discuss this.

IMO: just request the CPU runners on pytorch for things like the 2.4.x maintenance branch and one should be fine. We could also create ways to temporarily reduce the cuda target architectures during debugging sessions. These are "pytorch only" solutions that don't need to be imposed on the rest of conda-forge.

@h-vetinari
Copy link
Member Author

TL;DR: an extra zip entry costs ~nothing, but without we'd pay the extra runtime for every CI run

For pytorch main, the calculus is 3x CPU + 3xCUDA, and the aarch CPU build got longer because we're running parts of the test suite in emulation. So it's something like 12h runtime out of a total of 39, or almost a third. I don't care much what we do on the maintenance branches, but main should be relatively painfree to rebuild, and IMO we shouldn't accept an avoidable 50% more runtime.

The simple fact that the GPU agents are essentially our rarest and most precious CI resource would IMO be enough reason to do this - even if it were only for pytorch (which it isn't, because tensorflow will be in the same situation and has an even bigger matrix).

we are adding complexity to anybody zipping keys at conda-forge.

This affects a grand total of 5 feedstocks in conda-forge, 2 of which would benefit from it. 🤷

Overall, if this is so contentious, then I'll just go with Isuru's suggestion - but then don't start complaining about the superfluous CI workflows that this creates (half of which get skipped as soon as they hit conda-build) -- I find that quite a bit more painful than an extra zip entry, but de gustibus non est disputandum I guess...

@h-vetinari
Copy link
Member Author

Just I'm a little tired of fighting with these zipp'ed keys.

FWIW, I'm happy to help out with issues on these things if you ping me. I haven't run into any unsolvable cases yet (well, aside from extending the zip locally, which conda-build doesn't support).

@h-vetinari
Copy link
Member Author

Welp, given the concerns here I was planning to use Isuru's workaround, but on the pytorch feedstock that actually breaks windows

Checking out the ref
  "C:\Program Files\Git\bin\git.exe" checkout --progress --force -B main refs/remotes/origin/main
  Error: error: unable to create file .ci_support/linux_64_blas_implgenericc_compiler_version13channel_targetsconda-forge_maincuda_compilerNonecuda_compiler_versionNonecxx_compiler_version13github_actions_labelscirun-openstack-cpu-2xlargeis_rcFalse.yaml: Filename too long
  Error: error: unable to create file .ci_support/linux_64_blas_implgenericc_compiler_version13channel_targetsconda-forge_maincuda_compilerNonecuda_compiler_versionNonecxx_compiler_version13github_actions_labelscirun-openstack-gpu-2xlargeis_rcFalse.yaml: Filename too long
  [...]

Perhaps this is just a question of needing to add

git config --system core.longpaths true

on the windows server? Any chance we could try to add that @baszalmstra @wolfv?

@hmaarrfk
Copy link
Contributor

I don’t want to block this. Please disregard my concerns.

@isuruf
Copy link
Member

isuruf commented Jan 26, 2025

@h-vetinari
Copy link
Member Author

We should probably reduce 250 at https://github.com/conda-forge/conda-smithy/blob/5228b1965e6c4ff2a4891aa238ff6311a15908b7/conda_smithy/configure_feedstock.py#L797

Thanks for the proposition. I tested this in conda-forge/pytorch-cpu-feedstock#332 but even with some more careful changes in conda-forge/conda-smithy#2233, this either shortens the variant configs to the point that the key attributes (is it a CUDA build?) wouldn't be visible from the filename anymore, or still runs into issues with the filename lengths.

Despite the fact that the enthusiasm here is not exactly high, I think this is the right way to go about it conceptually (the zip that determines CUDA/non-CUDA is the canonical place for this distinction), and that the work-arounds are worse (filename length issues, aside from pointlessly doubling the number of linux jobs, etc.) than what they're meant to avoid (a single extra zip entry).

So I'll merge this and fix up the handful of affected feedstocks. The changes are minimal, e.g. I tested that this works for jaxlib:

--- a/recipe/conda_build_config.yaml
+++ b/recipe/conda_build_config.yaml
@@ -8,5 +8,8 @@ MACOSX_SDK_VERSION:        # [osx and x86_64]
 - '10.14'                  # [osx and x86_64]
 c_stdlib_version:          # [osx and x86_64]
 - '10.14'                  # [osx and x86_64]
-github_actions_labels:
-- cirun-openstack-cpu-large
+
+github_actions_labels:      # [linux]
+- cirun-openstack-cpu-large # [linux]
+- cirun-openstack-cpu-large # [linux]
+- cirun-openstack-cpu-large # [linux]

@h-vetinari h-vetinari merged commit ff043bb into conda-forge:main Jan 26, 2025
3 checks passed
@h-vetinari h-vetinari deleted the cpu_vs_gpu branch January 26, 2025 22:15
@isuruf
Copy link
Member

isuruf commented Jan 26, 2025

Please don't merge things like this on your own without approval

@h-vetinari
Copy link
Member Author

There were other approvals, and Mark withdrew his objections. You proposed alternatives but didn't request changes, much less provide arguments why this would be wrong. I'm willing to do the required clean-ups on the 5 affected feedstocks too, so this is a non-issue, unless you have a very concrete argument why this is the wrong approach.

@h-vetinari
Copy link
Member Author

Also: I've tried all the suggested alternatives beforehand as well, and they were worse than what they were working around.

@isuruf
Copy link
Member

isuruf commented Jan 26, 2025

I still have Mark's objection. The zip_keys for docker_image and related ones are getting out of hand. I'd like to one day remove docker_image from that zip_keys and adding something like this would make it impossible to do down the road.

@h-vetinari
Copy link
Member Author

I'd like to one day remove docker_image from that zip_keys and adding something like this would make it impossible to do down the road.

I agree on this goal! And there's nothing standing in the way there AFAICT - as soon as we drop CUDA 11.8, we'll be able to drop docker_image from the zip; the image and the GHA label are not coupled (i.e. both non-CUDA and CUDA 12.6 use the alma9 image).

@isuruf
Copy link
Member

isuruf commented Jan 26, 2025

Sure, but this zip_key is still getting out of hand, and I'd like to keep it as simple as possible.

@isuruf
Copy link
Member

isuruf commented Jan 26, 2025

This is a conversation we need to have in an issue and I think the best course of action is to revert this PR, open an issue and if needed discuss in a core meeting.

@h-vetinari
Copy link
Member Author

Sure, but this zip_key is still getting out of hand, and I'd like to keep it as simple as possible.

We recently removed c_stdlib_version and cdt_name. So it's getting simpler already.

If we keep using the newest CUDA, we might even get to drop {c,cxx,fortran}_compiler_version from the zip. My point is that the github_action_labels fundamentally belongs in the same zip as cuda_compiler_version, and the results of working around this are worse than the impact it's trying to avoid.

@h-vetinari
Copy link
Member Author

Wrote up an issue: #6967

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants